·LLM Integration·4 min read·Developers building with LLMs

Stop tuning prompts. Loop them instead.

A single LLM call either nails the task or it doesn't, and you find out in production. Looping — generate, check, feed the failure back, repeat — turns that coin flip into something that converges. Here's the pattern, the numbers, and how I run the same loop inside Claude Code.

A single prompt either nails the task or it doesn't, and you usually find out in production. The fix that stuck for me was to stop betting on one call. Instead, loop — generate, check, feed the failure back, repeat — and that coin flip turns into something that converges. Here's the pattern, the stop conditions that keep it from running forever, and the numbers from wiring it onto a JSON-extraction job that was failing 1 in 6 calls.

Prompt looping: generate, check, and feed the failure back until the output passes or the loop hits its cap

Prompt tuning is whack-a-mole

The old reflex, when an output is wrong, is to fix the prompt. You add an example, the failure rate drops, a new input breaks it again. Prompt tuning is whack-a-mole because you're trying to make one forward pass perfect across inputs you haven't seen yet. The cheaper move is to accept that the first pass is sometimes wrong and add a loop that catches it.

Start with a check, not a prompt

The loop only works if you can verify the output without a human in the loop. That check is the whole design: JSON.parse plus a schema validation, a compiler, a test run, a regex, a 200 from the API you're generating a call for. No check, no loop — you're back to eyeballing outputs and hoping.

Feed the failure back as context

On a bad output, don't just retry — retry with the error. Same conversation, plus what went wrong:

async function loop(input: string, maxTries = 4) {
  const messages = [{ role: "user", content: buildPrompt(input) }];
  for (let attempt = 1; attempt <= maxTries; attempt++) {
    const out = await model(messages);
    const result = validate(out);          // your check
    if (result.ok) return result.value;
    messages.push({ role: "assistant", content: out });
    messages.push({
      role: "user",
      content: `That failed: ${result.error}. Fix it and return only the corrected output.`,
    });
  }
  throw new Error("loop exhausted");
}

The error string is the trick. "Expected dueDate to be ISO 8601, got next Tuesday" tells the model exactly what to change — far more signal than a blind retry at higher temperature. A vague "that was wrong" gets you a vague second guess.

Cap it three ways

Loops that can't fail are how you get a surprise $400 bill, so bound every one of them three ways: a max attempt count, a token or cost budget summed across attempts, and a wall-clock timeout. Whichever trips first wins. Four attempts is my default — past that the model is usually stuck, and a fifth call just repeats the third. Break early, too: if the validator returns the same error two rounds running, bail instead of burning the rest. That one check has saved me more tokens than the attempt cap ever did.

The same loop, in Claude Code

You don't always need to write the harness. Most days I run this exact loop inside Claude Code, where the validator is just the test suite and the feedback is whatever the run printed:

/loop run the tests; if any fail, read the error, fix the
code, and run them again. stop when they're green or after 4 tries.

Claude writes a fix, runs the check, reads the failure, and feeds it back to itself — generate, check, repeat — until the suite passes or it hits the cap. Same three guards: "after 4 tries" is the attempt count, so a stubborn failure doesn't loop forever and drain the budget.

The numbers

On that extraction job, one-shot calls returned schema-valid JSON 83% of the time — 1 in 6 failed and silently corrupted downstream rows. With a four-attempt loop, 99.2% came back valid. 94% of those passed on attempt 1 or 2, so the loop cost was paid almost entirely by the hard inputs, and mean tokens per call rose 11%, not 4×, because most never looped. The remaining 0.8% now throw a clean, logged error instead of poisoning the database.

When to reach for it

Loop when you can check the output cheaply and the failure is recoverable with more context: extraction, code generation, format conformance. Don't loop when there's no automatic verifier, or when a wrong answer is unrecoverable — a loop just launders a guess into a confident one.

Two things matter more than the model you pick. A loop is only as good as the error message you feed back, so spend your effort there, not on the opening prompt. And always bound it — attempts, cost, and time. A loop without a ceiling is an outage waiting for the right input.

share:XLinkedIn