High infrastructure costs are worth paying attention to. Teams that never look at their cloud bills tend to accumulate waste quietly – redundant logging pipelines, forgotten container registries, storage that nobody owns. Getting engineers to care about the cost of what they build is a legitimate goal, and one that’s harder than it sounds.
But the diagnosis matters as much as the observation.
The claim that elevated infrastructure costs primarily signal expertise gaps is too simple. Sometimes high costs reflect poor decisions. Sometimes they reflect genuine scale, regulatory requirements, or deliberate tradeoffs made during rapid growth. Sometimes they’re a combination. The work is figuring out which you’re dealing with before reaching for a solution.
Arbitrary cost thresholds – any specific dollar figure applied as a universal rule – tell you almost nothing without context. A $1M infrastructure bill means something very different for a ten-person startup than a two-hundred-person company running regulated workloads. Benchmarks without context are decoration.
The “simplify everything to the minimum” instinct also deserves scrutiny. Complexity that was introduced carelessly should be removed. But complexity that exists because the problem is genuinely complex is a different matter. Martin Fowler’s framing on technical debt is useful here – there’s a difference between debt you took on deliberately with a plan to repay it, and debt you accumulated through carelessness. Treating them the same way leads to bad decisions. Simplifying a system that isn’t actually too complex creates its own costs: lost capability, rework, and the organizational knowledge you burn to get there.
Over-engineering is real. So is premature simplification.
The pragmatic approach
Start with visibility. You can’t make good decisions about costs you can’t see. Tagging cloud resources by team or service, and making cost data available to engineers – not just finance – is the foundational step. The FinOps practice of shifting cost accountability closer to the teams generating it consistently produces better outcomes than top-down mandates.
Once you have visibility, prioritize. Logs and metrics retention are frequently over-provisioned and respond well to targeted review. Storage lifecycle policies are often missing entirely. These are high-return, low-risk places to start.
Architectural simplification is worth pursuing when the complexity is accidental – when nobody can explain why a component exists, or when a system has accreted layers over time without intention. But it should follow understanding, not precede it. Rewrite the thing you understand poorly and you’ll often reproduce the same problems in a cleaner wrapper.
The real question isn’t whether your infrastructure costs are too high in absolute terms. It’s whether the value you’re getting justifies the spend – and whether you have enough visibility to make that call.