High Velocity Metrics!

Mar 20, 2025

How to choose high-velocity/data-driven/other buzzwords metrics in a simple, automate-able way.

(High Velocity Metrics sounds like a sales conference slogan.)

This blog is about a simple, data-driven method for choosing metrics. There are theoretically “better” ways to select metrics, but I’m starting to like this approach because it is straightforward to compute. The results pass the eye test. The method captures the main tradeoffs and is simple to integrate with other tools. You can make little dashboards to rank metrics, so it’s self-service. Sometimes, that’s more important than minimizing loss.

Roughly, there are two ways to analyze a problem with data:

Model the way the world works explicitly and completely. Then, map the data to the parameters of the model.
Model the assignment process by designing an experiment so that the assignment is random or uncover some trick in how the assignment works to identify the effect of some decision on some outcome.

Explicit models require stronger, less credible assumptions than restrictions on treatment assignment, but they have one advantage: a model of behavior defines Good and Bad explicitly. We don’t have to think too hard about deciding whether a policy/product change improves or hurts outcomes. Just run it through the model.

For example, if we have a demand model for a particular product D(P, X) and a supply model — e.g., say a monopolist provides the good — then we can evaluate how changes in X, say, affect profits directly within the model (if we are evaluating policies from the perspective of the monopolist). We know (read: assume) exactly how decisions filter into outcomes, and we know what our monopolist is trying to optimize because it’s part of the model.

In an assignment approach, we don’t need to say as much about how people make decisions. But now we need to decide what Good is.

Experimentation is an assignment approach. It focuses on how we assign people to treatments, not modeling how they make those decisions.

Usually, it’s easy enough to come up with the metric we want to optimize. Revenue, profits, etc. We know at a macro level which direction to take.

But it matters how fast we get there. Our velocity.

Velocity has two components: speed and direction. We want to pursue outcomes that move the ultimate business objective — revenue or profit — but we also want to pick metrics that converge before the heat death of the universe, ruling out revenue and profit.

Why?

Because everything in the world affects revenue and profit. Random fluctuations in demand, interest rates, the evening news, butterflies flapping their wings in Tokyo. Discriminating between the effect of all these other shocks and whatever we changed in the experiment takes a long time.

So, we want to balance a metric’s sensitivity—how quickly we can detect an effect—with its correlation with the end goal.

But how can we collapse these two dimensions into one to evaluate one metric choice against another?

A fun idea is to take “velocity” literally:

Velocity = Speed x Direction

Let’s call the ultimate goal metric Y and consider alternative metrics X that might have higher velocity.

I propose these two measures:

Speed = Median[|Tstat[X]|/ |Tstat[Y]|]
Direction = 2Pr[TE[X]TE[Y] > 0] — 1

(Where Tstat[X] is the t-statistic for the treatment effect with metric X.)

The Direction parameter is scaled to be between -1 and 1 because I want to fully commit to the bit that this is a measure of velocity and direction has a sign. Plus, it’s intuitive to say that if the probability the treatment effects have the same sign is less than 50%, then the “direction” is negative.

The idea is to use historical experiments “comparable” to the experiment you’re running to estimate these parameters. In an ideal world, they were run on the same population on which you’re running the upcoming experiment.

Speed

The idea behind the ratio of absolute T-statistics is that metrics that are easier to move relative to their noise will have larger T-stats for fixed sample sizes, requiring smaller sample sizes to detect an effect, i.e., they are faster. I’ve been using Median to play around with this because a few outliers tend to have an outsized impact on the mean.

The nice thing about this measure is that it can be readily computed from whichever table you store experiment results in.

Direction

What we want to capture with our direction measure is whether we would make the same decision using X and Y. Whether the two metrics point in the same direction.

If we took the problem seriously (dear God, no), we’d recognize that estimating Direction is a problem.

For one thing, the most natural asymptotics are in the number of experiments, with the sample size for each experiment remaining fixed. The measurement error of TE[X]TE[Y] never really becomes small, biasing estimates of Pr[TE[X]TE[Y] > 0].

I will clean this up at some point, but for now, whatever. It’s biased. Hopefully, it doesn’t mess up comparisons across X. Should be able to do something like the folks did for covariance in this paper to handle it when necessary: https://arxiv.org/abs/2402.17637 .

Filtering

Because this criterion doesn’t have the full context of your problem, i.e., it might be impossible for a certain metric to move given the particular thing your experiment does, despite it historically being easy to move and highly correlated with the goal metric’s treatment effect. So, to use this criterion, we need to first filter to ex-ante “reasonable” metrics that make sense for the experiment. Then, the criterion is a decent way to make a call on the metric from among the reasonable metrics quickly and scalably without requiring too much DS time.

Conclusion

I’ve used/developed other data-driven metric selection procedures in the past. These approaches have the advantage of optimizing a well-specified loss function. They have the distinct disadvantage of requiring an actual analysis for every experiment. It’s hard to make the equivalent of a sample size calculator for metric selection.

The neat thing about this approach is that you can spin up a little dashboard so stakeholders can provide inputs directly — saving time and scaling data-driven metric selection.

[Okay, it’s not that simple if you haven’t already estimated the treatment effect for the proposed metric in the past experiment — but in principle!]

Thanks for reading!

Zach

Connect at: https://www.linkedin.com/in/zlflynn/

Take my Udemy course!: Identifying Causal Effects for Data Scientists

If you want my help with any Experimentation, Analytics, etc. problem, click here.

Under the Null

Discussion about this post

Ready for more?