Series: DevOps in the AI Era · Part 6 of 6

Automation vs. Human Intuition: Finding the Balance in DevOps

Automation does what you program it to do. Human intuition does what no program has ever been taught. Here's the framework for knowing which one to deploy — and when.

NA
Naveed Ahmed
Lead DevOps Engineer @ DigitalOcean
·April 29, 2026·8 min read·Automation · Intuition · Balance

It was 11:47pm on a Thursday when our auto-remediation system did exactly what it was programmed to do — and nearly took down our entire production environment in the process.

A latency spike triggered an automatic scale-up. The scale-up increased traffic to a downstream service. The downstream service had a bug introduced that afternoon that caused it to fail under increased load. That failure triggered more latency, which triggered more scaling, which triggered more failures. A perfectly designed automation loop had created a perfectly designed cascading disaster.

I caught it because I was awake, happened to glance at Grafana, and noticed the pattern didn't look like a normal traffic spike. I killed the automation, rolled back the downstream service, and we recovered in 11 minutes. Without human intuition recognising that something felt wrong, we would have been down for hours.

The Central Tension in Modern DevOps

We are caught between two imperatives that are both correct and in constant tension:

Imperative 1: Automate everything. Manual processes don't scale, are error-prone, and create single points of human failure. Every task a human does repeatedly should eventually be automated.

Imperative 2: Never automate judgment. Complex systems behave in unexpected ways. Novel failure modes require pattern recognition, contextual awareness, and the kind of "this doesn't feel right" instinct that only experience provides.

The tension between these two imperatives is where most DevOps disasters happen. Either we under-automate and burn out our teams, or we over-automate and find ourselves passengers in a system we can no longer steer.

What Automation Actually Does Well

Automation excels at:

What Human Intuition Does That Automation Cannot

Human intuition — built from years of watching systems behave — does things that no automation framework has ever replicated:

The most dangerous automation is the kind that's almost always right. When automation is right 99% of the time, engineers stop monitoring it critically. The 1% failure then goes undetected until it's catastrophic. High confidence is where complacency lives.

A Decision Framework: What to Automate and What to Keep Human

Situation Type
Automate?
Human Role
Repetitive, well-defined, low-stakes
Yes — fully
Set the rules, review occasionally
Repetitive, well-defined, high-stakes
Partially — with human approval
Review and approve before execution
Novel situation, unclear cause
No — human leads
Investigate, decide, act
High-traffic / high-risk period
Reduce automation scope
Increased monitoring vigilance
Cascading failures / unknown blast radius
Override automation
Human takes full control

Building Systems That Know Their Own Limits

The best-designed automation I've built shares one characteristic: it knows when to stop and ask for help. Rather than trying to handle every scenario, it handles the scenarios it knows, and escalates the rest to humans with enough context for the human to make a good decision quickly.

This is also the design philosophy I apply to AI agents in infrastructure. The agent should be able to say: "I've identified three possible causes for this alert. I can automatically resolve cause #1 (disk space) and #2 (stale cache). Cause #3 (possible data corruption) requires human review. Here's what I know so far." That's a system I trust. One that just keeps acting without bounds is one I'm nervous about.

Design Principles for Trustworthy Automation

  1. Scope limits: Every automation has a maximum blast radius. Define it explicitly. Don't let automation touch more than X instances, or more than Y% of the cluster, without human approval.
  2. Confidence thresholds: Automation should act with confidence and escalate with uncertainty. Build in explicit "I'm not sure" paths.
  3. Audit trails: Every automated action should be logged with its triggering condition and the full context at the time of action. You need this for post-mortems.
  4. Kill switches that actually work: Test your ability to override automation under realistic conditions. A kill switch you can't activate quickly during an incident is not a kill switch.
  5. Graduated rollout: New automation should prove itself in low-stakes environments before being trusted in production. Give it a probationary period.
The goal isn't maximum automation. The goal is optimal outcomes. Sometimes maximum automation delivers optimal outcomes. Sometimes human judgment does. The wisdom is knowing which is which, in this system, right now, given what just happened.

Closing the Loop: What This Means for Your Career

As AI-powered automation becomes more capable, the engineers who will be most valued are not the ones who build the most automation — they're the ones who design systems where automation and human judgment work together effectively. That requires deep technical skill, but it also requires something less common: the wisdom to know the limits of both.

Your years of watching systems fail in unexpected ways, of developing the gut feeling that something is wrong before you can prove it, of knowing when to trust the playbook and when to throw it out — that is not made obsolete by better automation. It becomes more valuable. Because better automation needs better oversight.

The balance between automation and intuition is not a technical problem with a technical solution. It's a design problem that requires human wisdom to navigate. And that's the kind of problem that will always need us.

— Naveed Ahmed, Lead DevOps Engineer @ DigitalOcean

// Key Takeaways from the Full Series

// share this series finale 💼 LinkedIn 🐦 Twitter
← Back to
All Blog Posts
← Previous
Upskilling in the Age of AI