Situation
A global logistics operator relied on agentic systems to optimise routing across peak seasons. External APIs, weather feeds, and carrier contracts changed constantly. Agent logic did not.
What was at stake
Silent degradation meant suboptimal routes before anyone noticed. Fuel cost up, on-time delivery down. Peak season failures at 02:00 had no clear owner.
What Threesixty did
Mapped drift vectors: API contract changes, cache poisoning, and prompt rot on seasonal rule sets.
Deployed fleet health dashboards with predictive failure signals, not just uptime pings.
Implemented watchdog detection and human takeover with restore to last-known-good routing config.
Aligned modernization cadence with carrier integration windows to avoid peak-season surprises.
Technical approach
Gateway aggregated health per machine; Netdata streams from child agents to ops parent. Command Center fleet dashboard with incident context, backup restore for routing agent state, and regression checks after API vendor changes. Tailscale connectivity for secure operator access without exposing agents to each other.
Results
- Routing SLA improved from 94.1% to 99.7% within two quarters of managed continuity.
- Silent drift incidents dropped eighty-seven percent with proactive health reporting.
- Mean time to recovery on routing agents under twelve minutes with clear escalation paths.
- Operations leadership gained weekly continuity narrative, not reactive war rooms after missed SLAs.