The Residual
Understanding the “measure of our ignorance.”
The equations we estimate in data science, statistics, economics, etc., tend to look something like this:
Outcome = F(Observables) + Unobservable
We have a set of observed covariates or features that relate in some way to the outcome, but there’s another part of the data-generating process that we do not observe.
We call the unobserved part of the equation a “residual.” The name suggests it measures the limits of how well the covariates can predict outcomes. This intuition doesn’t apply when the equation we’re estimating is supposed to represent something — when the equation is not a merely statistical relationship.
We can think of outcomes as a function of the choices people make and the states they face, i.e., Y(S, C), where S is the complete vector of states and C is the complete vector of choices. If we observed S and C, we could perfectly explain how Y varies both in a predictive sense and in a causal sense because observing (Y, S, C) and (Y’, S’, C’) would be enough to tell you how moving from (S,C) to (S’,C’) changed Y.
[Causal inference is fundamentally a missing data problem: we don’t observe all potential outcomes.]
The problem we face is that our observed covariates are a subset of (S,C), and our residual is an index of the other state and choice variables that we do not observe.
If we think about residuals this way — from first principles — it is much easier to think about what the residual is and what identification assumptions make sense. For example: Does it make sense that F and the unobservable states and choices are additively separable?
When we think about residuals as states and choices, a lot of the “standard” assumptions start to sound a little… less satisfying?
Why are the observed states and choices treated so asymmetrically from the unobserved states and choices? Why does the fact that we can measure these variables and not those variables change how they enter the data-generating process? Etc.
Examples Of The Different Kinds Of Residuals
Prediction Error
The name “residual” makes the most sense in the regression problem, where we want to estimate E[Y|X=x] for some covariates X, i.e.
Y = E[Y|X=x] + V, where E[V|X=x] = 0.
V is the residual in the sense that it represents the part of Y ignored by X. Var[V] = 0 if Y=Y(X), so the residual exists and matters because Y is not a function of X.
So, our first definition of a residual is the part of the dependent variable we can’t predict.
In this model, the relationship of the residual to real-world states and choices is less important because our primary goal is to estimate the statistical object E[Y|X=x].
Effect Heterogeneity
The residual can also be unobserved heterogeneity in treatment effects. If we run an experiment, we might identify the average treatment effect like so:
Y(z) = E[Y(z)] + U(z)
Y = ZY(1) + (1-Z)Y(0)
Y = E[Y(0)]+ (E[Y(1)] — E[Y(0)]) Z + U(0) + Z(U(1) — U(0))
Because U(z) are independent of treatment assignment in an experiment: E[U(0) + Z(U(1)-U(0))|Z] = 0,
So, when we write our regression specification as Y = a + bZ + V, the residual (V) means U(0) + Z[U(1)-U(0)].
In this particular case, the residual is both the prediction error from regressing Y on Z, and it has a structural interpretation.
Structural Residuals
Now, suppose we’re estimating an equation from some theoretical model. Say a production function from economics:
log Q = log F(Z) + log A,
Where Q is output, Z is a vector of inputs, and A is total factor productivity.
We have data on inputs (Z) and output (Q), and we want to estimate this equation to recover the production function and productivity.
Productivity is a residual. It is the part of the production process that is unobserved, but its meaning comes from the behavioral model, not a statistical criterion.
The definition of productivity and how to measure it is part of a grand old literature (GOL) spanning more than 80 years. From Marshak and Andrews (1944) to Griliches (1994) to some great, recent papers and one okay essay. The one thing the literature agrees about is that productivity is not the error or a residual from regressing outputs on inputs.
To see why, notice that when Q = F(Z)A, greater productivity increases the marginal product of inputs. Therefore, the choice of inputs will be related to productivity and E[log A|Z] != E[log A].
Why A Careful Model Of The Residual Matters
How we identify structural functions or make causal inference depends crucially on how we define the residual: what states and choices are unobserved? For example:
In demand estimation, it’s common to treat each product as a bundle of product characteristics (to reduce dimensionality) and write something like this: Q = D(P, X, U), where X is a vector of observed product characteristics, and U is unobserved. We can treat U as an index of unobserved product characteristics. A common identification assumption is that U is independent of X. If we don’t think about the residual as unobserved choices and states, that assumption sounds pretty reasonable. “Shocks” sound like the sort of things that are at least roughly uncorrelated with product design decisions. But that’s not what U is in our model. U is an unobserved product characteristic. Are the product characteristics in X independent of each other? No. So why do we expect them to be independent of unobserved product characteristics?
When we write Q = F(Z)A for the production function, how should we think about “A”? One model treats A as a parameter, heterogeneity in how the production function translates Z into outputs. But: what is that heterogeneity? It doesn’t really make sense that some firms or countries can, by their very nature, produce more output. There must be a reason. Whatever that reason is is an unobserved factor of production. So, instead, let’s think about the residual as an index of unobserved inputs. Z is the vector of observed inputs, and Q is the observed output. What’s left over are the inputs we don’t observe. These different models of the residual produce different identification strategies.
Thanks for reading!
Zach
Connect at: https://linkedin.com/in/zlflynn
Check out my Udemy course on Causal Inference: https://www.udemy.com/course/identifying-causal-effects-for-data-scientists/?couponCode=A105FEABA0A750B7BB41
If you want my help with any Experimentation, Analytics, etc. problem, click here.


