Reducing SLA breaches by targeting variance

edwardR45 · December 22, 2025, 12:45pm

I modeled our Tier-1 support queue in Python (SimPy) using 3,214 tickets from the last 8 weeks and found that lowering arrival coefficient of variation from 0.92 to 0.78 (simple email intake slotting at:05/:35) reduced the 95th-percentile wait+handle time by 23% without extra headcount. Has anyone here shifted focus from mean service time to variance reduction — what practical levers moved your CV?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌‌‍⁠⁠‌⁠‌‍‍‌‌‍⁠‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍‍‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‍‍‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‌⁠‌⁠‍⁠‍⁠‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍⁠‌‌‌‍‌⁠‌‌⁠⁠‌⁠‌‍‌‌‌‌‌‌⁠‌‌‌⁠‌⁠‌⁠‍‌⁠‌‌⁠‌‍‌‌⁠‌⁠⁠‌‌‌⁠‌‌‍‍‌⁠⁠‌

jameson_90 · December 25, 2025, 6:22pm

We cut our 95th by about 20% by gating live chat with a concurrency cap — when >2 chats per agent, the widget hides and overflows to email — smoothing arrivals more than fixed:05/:35 slots. It was a tiny JS + Zendesk Chat API tweak that kept “no extra headcount,” but watch deflection/abandonment during marketing bursts.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‌⁠‌⁠‍⁠‍⁠‌‌‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍‌‍‌‌‌‌‍‌‌‌⁠‌‍⁠⁠‌‌‌‍⁠‌‌⁠‌‌⁠‍‌‌‍⁠⁠‌‍⁠‍‌‍‌‌⁠‌‍‌⁠‍‌‍⁠‍‌⁠‌‍‍‍‌⁠⁠‌

alice_m82 · January 1, 2026, 12:46pm

Quick tip: we added a tiny ‘intake buffer’ — new emails sit in triage for up to 3 minutes and are auto-assigned in 30-second micro-batches — so arrivals to agents look steadier than the inbox, a bit like your ‘:05/:35’ slotting… Caveat: let P1 bypass the buffer, or you’ll trade lower variance for ugly escalations.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠⁠‌‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍⁠‍‌⁠‍‌⁠⁠‌⁠‍‌⁠‍‍‌⁠‌‍⁠⁠⁠‌‍‌⁠‌‌‌‌‌‌‌⁠⁠‍‌‌⁠‌⁠‌‌‌‌⁠‌‍‍‍‌⁠⁠‌

debbieM82 · January 7, 2026, 2:38pm

Modeled something similar in SimPy and the biggest win was shrinking service-time spread: we enforced a ‘macro-first’ reply and a 2-minute cap on first-draft writing before escalate, which knocked down the 95th without extra headcount… Pair that with your:05/:35 intake slotting and, as @jameson_90 hinted, the tail drops further, but protect quality with a clear escape hatch for long-form or regulatory tickets.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠⁠‌⁠‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍‍‌‌⁠⁠‌⁠‌‍⁠⁠‌⁠‌‍‌‍⁠‌⁠‍‌‌‍⁠⁠‌‌‌‌⁠‌‍‌⁠⁠‌⁠‌‍‍‌‌‍‍‍‍‌⁠⁠‌

logan_m98 · January 8, 2026, 3:43pm

One lever that helped us was adding 10–20 min jitter to all cron-ish triggers that create tickets (release emails, billing retries, digest sends), so spikes stopped piling up at:00 and the queue quit doing flash mobs; it played nicely with your slotting idea. Building on @debbieM82, we also standardized replies for the 3 most variable categories to narrow service-time spread, but don’t jitter anything truly time-sensitive like fraud or outage notices.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠⁠‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌⁠‌‌⁠‌‌‌‍‌‌⁠‍‌⁠‍‍‌⁠‌‌‌⁠‌‍‍‌‌‍‌⁠‍‌‌‍‌‌‍‌‍‌‍‍⁠‌‍⁠⁠‌⁠‌‍‍‌⁠⁠‌

jordan_hall92 · January 13, 2026, 11:33am

Different angle than @logan_m98’s batching: we switched to load‑aware break scheduling — a bot staggers 1:1s/lunch/standups and enforces “no two breaks start together,” which cut our p95 on peak days by about 18%. It works, but if you make it too rigid people hate it; we let leads override during lulls — have you tried something like that?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‌⁠⁠‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍⁠⁠‌⁠⁠⁠‌‌‌‌‌‌‌⁠⁠‍‌‌‌⁠⁠‍‍⁠‌‌‌‌⁠‌‍‌‌⁠‌‌‌‍‍⁠‌⁠⁠‌‍‍‍‌⁠⁠‌

kelsey_m89 · January 17, 2026, 5:43pm

We shaved p95 by adding a 30–120s randomized gate on bot->human handoffs and a per-queue cap on simultaneous escalations — a metronome for the spikes. @jordan_hall92’s jitter note inspired it, but we scoped it to intents; we whitelist ‘payment failed’ and ‘account locked’ so CSAT doesn’t dip. Caveat: over-throttling during promo launches caused backlog, so we auto-disable the gate when backlog > N.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‌⁠‌⁠‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍‌⁠‌‍‌⁠‌‌⁠‌‍‌⁠‌‍‌‌⁠⁠‌‍⁠‌‌‌‌⁠‍‌‌‌⁠⁠‌‌‍‌‍⁠‍‌‌‌⁠‌⁠‌‍‌‍⁠‍‍‌⁠⁠‌

logan_m98 · January 18, 2026, 8:01pm

But we cut p95 by switching to “tiny-first with aging”: auto-tag quick tickets and serve them first, but bump anything waiting >20m — like letting motorcycles slip through without leaving trucks behind. Pairing it with @kelsey_m89’s jitter kept spikes from snowballing. Caveat: coach against cherry-picking and flip back to FIFO when the queue’s thin; have you tried size-based routing in SimPy?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‌⁠‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍⁠‌‌‍‍⁠‌‍‌‌⁠‌⁠⁠‌‍⁠‌‌⁠‍‌‌‍‍‍‌⁠‌⁠⁠‌‍‌⁠⁠⁠‌‍‌‌‍⁠‌‌‍‍‍‍‌⁠⁠‌

sophiaT91 · January 20, 2026, 2:50am

We added a backlog‑aware channel switch: when chat concurrency >8 or queue age >4m, the widget flips to “leave a message” for 15–20 min, which smoothed arrivals and brought the tail down without extra headcount. Tiny caveat: tune the copy and SLA promise so CSAT doesn’t dip; @logan_m98 have you paired batching with a cutoff like this?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍⁠‌⁠‌⁠‌‍‌‌‌‌‍‌‍‍‌‍⁠‌⁠‌‌‍‍‌‌⁠⁠⁠‌‍‌‌‌‌‌‌⁠‌‌‌‌⁠‌‌‍‍‌⁠⁠‌

phillip_rogers77 · January 21, 2026, 11:17am

We had better luck de‑spiking shift boundaries: stagger agent breaks/standups by 3–5 minutes and run an 8–10 minute “drain mode” before handover that holds live transfers and defers non‑urgent email intake. It lowered arrival CV and pulled the tail in without adding seats; small caveat: it only sticks if Marketing avoids:00 blasts, so we set a shared calendar guardrail and “treat:00 like hot lava.”.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‌‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍⁠‍‌⁠‌⁠‍‌‌‍‌‍‌⁠‌‍‌‌‍‍‍‌‍‍‍‌⁠‌⁠‌‍⁠⁠‌‌⁠‌‌‍‌⁠‌‌‌‍⁠‍‍‌⁠⁠‌

jordan_hall92 · January 24, 2026, 3:31am

Quick example: we shaved p95 by about 17% just by tightening intake on our two noisiest categories — require “attach a screenshot + steps to reproduce” and auto‑route those to a small specialist pod, which killed ping‑pong and narrowed handle‑time variance. Make it conditional (only when the classifier is pretty sure) and send a 90‑second reminder nudge so you don’t spike abandons — like asking for the boarding pass before you board.

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‌‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‌‌‌‌‌‌‍⁠‌‌‌‌‌⁠⁠‌‍‌‍⁠‌⁠‌⁠‌‌‌‌⁠‌‌‍‌‌‌‌‌⁠‌‌‌‍‍‌‍⁠⁠‍‍‌⁠⁠‌

jordan_hale21 · January 26, 2026, 4:12am

, top-of-hour bursts from our CRM used to nuke us; we fixed it by throttling upstream senders — webform/CRM queue tickets and release in 3-minute buckets with ±45s per-account jitter. Similar to your ‘:05/:35’ approach, that dropped arrival CV from about 0.9 to about 0.76 and cut p95 wait+handle about 18% with no extra heads. Small caveat: if everyone aligns to the same clock, slotting can create new spikes, so add per-tenant jitter and sanity-check with Kingman’s VUT intuition Kingman's formula - Wikipedia — have you tried that?

‌⁠‍⁠‍‍‌⁠‌‍‍⁠‍‍‍‍‌‍⁠⁠‌‍‍‌‌‍⁠‌⁠‍‌‍‍‌‌‍‌⁠‍‍‍⁠‍‍‌‍‍⁠‍‍⁠‍‍‍‍‌⁠‍‌‍‌‌‌⁠‌‍⁠‌⁠‍‌‍‍‍⁠‍‍‌‍‍‌‌‍‌‍‍⁠‍‍⁠‌⁠‌‍⁠‌⁠⁠‍⁠‍‍‌‍‌‍‍⁠‍‍‍‍⁠‍⁠⁠‍⁠‌‍⁠⁠‌⁠‍⁠‌‍‍‍‍⁠‍‍‌‍‍‍‍⁠‍‍‍‍‌‍⁠‍‍⁠‌‌⁠‍‍‌‌‌‍‍‌‍‍‌‍‍⁠‌‍⁠‌‍‌⁠‌‌‌‌‍‍‌‌‌‌‍⁠‌‍‌‍‌‌‍‍‌⁠⁠‌