Measuring Engineering Performance: Activity Is Not Output

The case for making engineering performance visible is legitimate. Hidden underperformance is a real organizational problem, and managers who avoid measuring their teams are often just avoiding difficult conversations. That much is fair.

The problem is the tool, not the intent.

GitHub activity metrics – PR counts, commit frequency, lines of code – are proxies. They measure inputs, not outcomes. And once any metric becomes a target, it stops being a reliable signal. This is Goodhart’s Law, first articulated in economics but applicable almost anywhere humans are being measured. Engineers are rational people. Tell them their PR count matters and you’ll get more PRs – smaller, more frequent, shallower in scope. Tell them reviews are tracked and you’ll get more reviews that say “LGTM.” You haven’t improved performance. You’ve changed what people optimize for.

The engineers I’ve seen deliver the most value are often the least visible in activity metrics. The architect who spent three weeks on a data model that prevented six months of rework. The senior engineer who unblocked four teammates rather than closing their own tickets. The person who wrote the runbook that prevented the 3am page. None of that registers in a dashboard.

There’s a better framework. The research behind Accelerate – by Nicole Forsgren, Jez Humble, and Gene Kim – identified four metrics that consistently correlate with organizational performance: deployment frequency, lead time for changes, change failure rate, and time to restore service. These are outcome metrics. They tell you whether your team is delivering value, not just whether they’re busy. They’re also harder to game, because they measure the end result of many people’s work, not individual activity.

The argument that high performers embrace visibility and only underperformers resist it is a false binary. High performers resist bad metrics because they’ve seen what happens when you optimize for the wrong things. That’s not fear – it’s experience.

The pragmatic approach

You don’t need a perfect measurement system to start. Deployment frequency is easy to instrument and tells you a lot about your team’s health. Lead time for changes – from commit to production – is another early signal worth tracking.

For individual contribution, qualitative signals matter more than volume. How someone writes a PR description tells you how they think. How they engage in code review tells you how they communicate. These are harder to dashboard but they’re better indicators of someone who’s actually contributing.

Start with outcomes at the team level. Layer in individual signals carefully, and treat them as data points in a conversation – not a verdict.

The goal is a team that ships valuable software reliably. Measure that.