Situation before runner strategy
- Queue times and flakes blocked teams that outgrew default runner pools.
- Security needed clear boundaries for artefacts and secrets on specialised hardware.
- FinOps wanted visibility into runner utilisation, not another opaque bill line.
Runner bottlenecks
Core points
- Stakeholders needed a single credible story before budgets and timelines locked in.
- Legacy habits and tooling debt competed with the outcomes marketing promised externally.
- Scope stayed honest by naming what would move in phase one versus what waited on data.
Risk and isolation needs
Core points
- Regulated or high-trust contexts punish silent assumptions about access, retention, and blast radius.
- Integration seams between teams multiplied rework when contracts were not written down.
- Non-prod behaviour that did not mirror production invited surprises during the first real traffic.
Fleet design and operations
Core points
- Automation and observability had to land together so operators could trust rollback and forward fix.
- Owners were named for pipelines, environments, and data handoffs instead of a shared inbox.
- Change management sat next to engineering so habits survived the first month after go live.
Skunk tip
- Rehearse one failure mode weekly until the runbook is boring, not heroic.
Outcomes
Core points
- Velocity showed up when releases shrank and evidence travelled with the merge request.
- Cost and risk curves improved when unused paths were retired instead of left on life support.
- The durable lesson is that discipline on ownership beats another headline feature without adoption.
If your rollback is a myth, your deploy frequency is vanity.



