Designing AI Features That Survive Real Users
Demos pass because there's one user, one prompt, and no outages. Real users break all three. The operational layer that turns an AI demo into something you can keep on call.
The gap between "AI feature in a demo" and "AI feature that survives production" is mostly the boring stuff. None of it is in the tutorials. All of it shows up in the postmortem.
The five things I now wire up before the feature ever sees a user:
Prompts in version control, not in code. Store prompts as YAML or markdown files keyed by a name and a version. Load them at startup. The reason: the day someone tweaks a prompt and ships a quality regression, you want a diff and a revert, not "wait, what did it say last week?"
A model fallback chain. Primary model (Claude or GPT-4-class). Secondary model (cheaper, same provider). Tertiary (different provider entirely). When the primary 429s, fall through. The fallback responses might be slightly worse, but "slightly worse" beats "outage."
foreach (var model in new[] { "gpt-4o", "gpt-4o-mini", "claude-haiku" })
{
try { return await llm.Call(model, prompt); }
catch (RateLimitException) { continue; }
catch (ProviderDownException) { continue; }
}
throw new AllFallbacksExhaustedException();
A per-tenant cost budget. One Slack-integration tenant spamming your summary endpoint can burn a week's API budget by Tuesday. Track tokens per tenant per day in Redis. Cut them off at the budget, not at the AWS bill alert.
A kill switch for runaway loops. Agentic flows can loop. Sometimes they loop until the timeout, sometimes until the cost limit. Hard-code a max-step counter and a max-cost-per-request limit. Both, not either.
An audit trail. Every AI-generated output that goes to a user gets logged with: input, output, model, prompt version, latency, cost. You will, eventually, get a "did the AI say this?" support ticket. The trail is the only honest answer.
None of this is fun to build. All of it is what separates a demo from a system you can keep on call without dreading the pager. The order I'd build them in: audit trail first (it's also your debugger), then prompt versioning, then fallback chain. Budgets and kill switches before you go beyond a closed beta.
The thing I see senior engineers underestimate: the operational layer is more important for AI features than for normal ones. Normal endpoints fail loudly. AI endpoints fail quietly and confidently. The instrumentation is the only thing that tells you the difference.