Should we compute confidence intervals for partially identified parameters?

Nov 26, 2024

This is a hot take for a niche audience, but let’s do it.

Thesis:

Confidence (or credible) intervals are too conservative for most partially identified models.

What is Identification?

I’ll start by defining what I mean by identification. Statistics has the blessing and curse of being used in many fields which have developed different names for the same things… and the same names for different things. So, here’s what I mean by it.

We identify parameters by using modeling assumptions to map distributions to parameters. Identification is not about a dataset. The sample size never shows up. It is a mathematical relationship between a distribution and parameters.

For example, in a standard regression model, we have:

y = x’b + e, cov[x,e] = 0

Using this model, we can write:

cov(x,y) = cov(x,x)b + cov(x,e) = cov(x,x)b

If cov(x,x) is invertible, then: cov(x,x)^(-1) cov(x,y) = b. So, we have a mapping between the distribution of (x,y) and the parameter b.

In this case, we say the distribution is point identified because if we know the distribution of (x,y), we know the parameter b.

Partial identification is when the mapping from the distribution is not to a unique point but instead a set of points. For example:

y = x’b + e, cov[z, e] ≥ 0

In this case, the identified set would be B = { b : cov(z,y) ≥ cov(z,x)’b }.

As in the point-identified case, when we try to take the mapping to the data, we don’t know the true distribution of (x,y,z). We have to estimate it from data.

Our estimates will be subject to statistical error, but how should we deal with it?

Statistical error in partially identified models in practice

In theory, we should take statistical error into account — even for partially-identified models.

But in practice… it’s not so clear. Partially identification strategies usually sound something like this:

The parameter can’t be smaller than LB because even in this crazy case, the parameter would be greater than LB, and we know the world is less crazy than that.

I’ve yet to see a partial identification strategy where the lower/upper bound comes via a scenario we believe could be true. It’s based on a very conservative/extreme scenario. I’m sure there is a counterexample, of course. There’s one of everything.

For example, in my little paper on partially identifying productivity, I formed weak bounds on the production function by assuming (1) the production function is increasing and (2) the relationship between productivity and factor choice is weakly positive (in a certain sense).

But, of course, I don’t believe the production function can be flat (the extreme version of (1)) or that productivity is independent of factor choice (the extreme version of (2)). I just don’t want to make any tighter restrictions on the correlation between productivity and factor choice or about the slope of the production function.

What does slack between our assumptions and our beliefs mean for confidence intervals?

This reasoning translates the naive estimate of the identified set into a valid confidence interval. Consider the one-sided lower confidence interval.

From our partial identification strategy, we have the following:

LB — the population value of the lower bound implied by our empirical assumptions.
LB* — the population value of the lower bound, given what we think.
LB(n) — the lower bound (LB) estimate implied by our empirical assumptions.

Suppose: LB << LB* in the following way: sqrt(n)[LB — LB*] -> -infinity.

Then, the point estimate of the bound itself, LB(n), is a valid confidence interval.

Why? Let’s go back to basics: what is a valid confidence interval? A random sequence c(n) such that:

min { pr[c(n) ≤ theta] : theta ≥ LB* } = pr[c(n) ≤ LB*] -> ≥ alpha

The worst probability of not making a type 1 error under the null is at least alpha.

Now, plug in the point estimate LB(n) as a candidate for c(n).

pr[LB(n) ≤ LB*] = pr[sqrt(n)[LB(n) — LB] ≤ sqrt(n)[LB* — LB]]

Suppose sqrt(n)[LB(n) — LB] ~ Normal distribution (the usual case). Then, we’re comparing a normal distribution to something diverging to +infinity. So:

pr[LB(n) ≤ LB*] = pr[sqrt(n)[LB(n) — LB] ≤ sqrt(n)[LB* — LB]] -> 1.

And, of course, no matter what significance level we’re using, 1 ≥ alpha. So, choosing c(n) = LB(n) is a valid confidence interval.

We should only form confidence intervals for partially identified parameters if we think the bounds can be attained. If we’re choosing a conservative identified set, then we don’t need to take into account the statistical error.

Thanks for reading!

Zach

Connect on LinkedIn

If you want my help with any Experimentation, Analytics, etc. problem, click here.

Under the Null

Discussion about this post

Ready for more?