September 30, 2025·Spec-Driven Development·3 min read·Mid-level developers

Spec-Driven Development with OpenSpec: Killing the 'It Works on My Machine' of AI Coding

AI coding tools generate code that compiles, looks right, and quietly disagrees with what you actually wanted. Specs in version control turn vibes into contracts.

Two engineers asked Cursor the same question on the same repo. Got two different implementations. Both compiled, both passed tests, both shipped, and now production behaves differently on each code path. The bug isn't in the model. The bug is that nobody wrote down what "correct" meant.

This is the gap OpenSpec was made to close.

What OpenSpec is actually for

OpenSpec is a convention (and a small toolchain) for writing executable specifications for features. What the input looks like, what the output must look like, what invariants must hold. You commit them next to your code. Your AI assistant reads them before generating. Your test runner reads them after. Same source of truth for both ends.

Useful analogy: TypeScript types fixed "what does this function return." Specs fix "what is this feature supposed to do." Both shrink the gap between intent and implementation.

A minimal OpenSpec file

# specs/features/draft_reply.openspec.yaml
feature: draft_reply
description: >
  Draft a polite reply to an inbound customer email based on the thread history.

input:
  thread_id: string
  tone: enum[formal, friendly, neutral]

output:
  draft: string         # plain text, 80-300 words
  cited_thread_ids: list[string]   # which messages were referenced

invariants:
  - draft never contains placeholders like "[CUSTOMER NAME]"
  - draft length between 80 and 300 words
  - every cited_thread_ids entry exists in the input thread

acceptance_examples:
  - input: { thread_id: "t_001", tone: "friendly" }
    expect:
      - output.draft is a non-empty string
      - "Thanks for reaching out" appears in first sentence

Why it works

The spec is the single source of truth. When you ask the AI to implement draft_reply, it reads the YAML, generates code, and you can mechanically check whether the output conforms. Both at review time and in CI. The invariants catch the kind of subtle drift (placeholder text leaking into emails) that humans miss in PRs and that pure unit tests usually don't think to cover. Especially the "[CUSTOMER NAME]" leak. That one nearly got me once.

When to use spec-driven development

Anything that crosses a team boundary, anything user-facing, anything where "looks right" hides nuance. Most production features, really. Especially valuable on teams using AI coding assistants heavily, because specs are how you keep generated code aligned without re-reviewing every diff line by line.

When not to

Throwaway scripts, one-off data crunches, design spikes where the goal is to find the requirement, not encode it. Forcing specs on exploration burns goodwill and slows the find loop. Use the right tool for the moment.

How a spec gates a generated PR

flowchart LR
    Dev[Developer prompt] --> AI[AI assistant]
    Spec[(OpenSpec file)] --> AI
    AI --> PR[Generated PR]
    PR --> CI["CI: spec linter +<br/>acceptance examples"]
    CI -- pass --> Review[Human review]
    CI -- fail --> AI
    Review --> Merge[Merge]

Conclusion

Pick the next feature in your backlog and spend 20 minutes writing its OpenSpec before you prompt anything. The friction of writing it down is exactly the friction of deciding what you actually want. And that decision is the part the AI cannot make for you, no matter how good the model gets.