Making inference on optimization problems
In data science, economics, and basically any field with a “model” — except maybe fashion? — you have the following very common problem:
Fit a model M to some data X.
Use the estimated M in some optimization problem, i.e. max_{a} F(M(a, X), a) st: a in A.
There’s this common practice to look at standard errors, etc, for the parameters of model M, but those aren’t really what matter, right? We care about either the value of the objective function or the error in the arg max.
This short blog post shows how to do this when everything is nice and smooth.
Let’s do it with a simple example. Suppose you have a demand model: Q = G(P, theta). From some data, you estimate the demand curve and you want to choose price P to maximize revenue:
max G(P, theta) P
How does error in the estimated demand parameters theta, which is usually something you can just read off from whatever statistical package you use to estimate the model, translate into error in: (1) revenue and (2) optimal price?
There is a really slick, simple theorem that can help us out here. The envelope theorem. The envelope theorem states (under some conditions — be careful, especially if you have a constrained optimization problem!) that:
V(theta) = max G(P, theta) P
DV(theta) = D_{theta} G(P*, theta) P*
i.e. the derivative of the value with respect to the demand parameter is just the derivative of the objective evaluated at the optimal parameters.
So, how do we use this? Well, if we already know the covariance matrix of our estimator of theta (which is usually just spit out by the statistical package we’re using), then we can just apply the Delta method to get the statistical uncertainty around revenue implied by the uncertainty in the demand parameters:
var[V(theta)] = DV(theta)’ Var[theta_hat] DV(theta)
Okay, so now we’ve got a sense for the basic strategy: define our object of interest as a smooth function of theta and then apply the Delta method. So, how do we find P(theta) so we can characterize the uncertainty in the price choice?
The first order conditions of the problem are:
D_{P}G(P, theta) P + G(P, theta) = 0
Because you were all paying attention in calculus, you, of course remember the implicit function theorem. Partition [theta, P] and we’re trying to find [theta, P(theta)] such that the above holds. (I’m writing it like this instead of doing directly in a single equation because this same strategy will work if you’re optimizing multiple choices)
The notation gets a little dense, but basically:
DP(theta) = -[D_{P} FOC(theta, P(theta))]^{-1} D_{theta} FOC(theta, P(theta))
So in practice, you evaluate this derivative at theta = theta_hat and P(theta_hat) = P* to get your derivative and then do the same thing:
Var[P(theta_hat)] = DP(theta_hat)’ Var[theta_hat] DP(theta_hat)
So now you have measures of uncertainty for both objective functions and optimized choice parameters (if they are smooth, differentiable). Enjoy!
Caveats
The key feature you need to use these theorems (at least as written here) is that some vector of finite dimensional parameters determines the optimal choice, P(theta). That means there is no infinite dimensional function that appears in theta.
In constrained optimization problems, you’ll need to include the Lagrange multipliers in these expressions which can make things a bit more involved.
Zach
Connect at: https://www.linkedin.com/in/zlflynn/
If you want my help with any Experimentation, Analytics, etc. problem, click here.

