An Introduction to Spec-Driven Development

Author: Sean Brandt

An Introduction to Spec-Driven Development: Addressing the Shift in Software Delivery Practices

Part 1 of a series on specification-first practices for AI-augmented software delivery.

AI coding agents produce code at a rate no human team can match. Early evidence suggests that this velocity, without upstream discipline, creates more problems than it solves.

Faros AI's AI Productivity Paradox Report (June 2025), drawing on telemetry from over 10,000 developers across 1,255 teams, found that while individual task completion rose 21%, PR review time climbed 91% and bugs per developer increased 9% — with no significant correlation between AI adoption and improved company-level delivery metrics. A randomized controlled trial by METR (July 2025) tested 16 experienced open-source developers across 246 real tasks on repositories they had maintained for years. Developers took 19% longer with AI assistance — while believing they were 20% faster. The sample is small and the authors frame it as a snapshot of early-2025 capabilities, but the perception gap is striking. And CodeRabbit's State of AI vs Human Code Generation Report (December 2025), analyzing 470 open-source GitHub pull requests, found that AI-generated code produces approximately 1.7 times more issues per PR than human-authored code, with higher severity across logic, security, and performance categories.

None of these studies proves harm. They do suggest, consistently, that code generation is no longer the primary bottleneck in software delivery. The bottleneck has moved upstream — to specification, review, and verification. Producing code is cheap. Knowing what code to produce, and verifying that it does the right thing, is not.

Spec-Driven Development (SDD) is the emerging discipline that addresses this shift. Rather than treating AI agents as search engines that return code in response to prompts, SDD treats specifications as the primary engineering artifact and code as generated output. GitHub (Spec Kit), Amazon (Kiro), and a growing open-source community have each built tooling around this approach — not because it is novel in concept, but because the economics of AI-augmented development demand it.

This article introduces the practice, surveys the current tooling landscape, and examines both the promise and the failure modes. The next installment examines what happens when file-based SDD meets enterprise-scale systems — and whether graph-based specification infrastructure is the answer.

Why Specifications Matter More Now

In traditional software development, specifications were often documentation: written after the fact, maintained grudgingly, and read by almost nobody. The code was the source of truth, and everyone knew it.

AI agents changed the equation. A human developer carries implicit context about what "reasonable" means for a financial transaction, what edge cases matter in a claims processing workflow, or why a particular architectural choice was made three years ago. An AI agent has none of this. It generates locally correct code that is globally wrong — syntactically valid, functionally broken, architecturally incoherent.

Better prompts help. Better specifications help more. SDD focuses on the latter: treating specifications as the authoritative source of truth — machine-readable contracts that define what the system must do, what constraints it must satisfy, and what correctness looks like. Code becomes a downstream derivative. The primary value of senior engineering shifts from code production to defining correctness.

Unlike waterfall or Big Design Up Front, SDD specifications are living artifacts that evolve with the system. They are version-controlled, reviewed, and maintained with the same discipline as production code — because they are the production artifact. The code is generated output.

What Makes This SDD and Not Just Planning Documents

Most engineering teams already have markdown files in their repositories — READMEs, design docs, ADRs, and/or onboarding guides. Having documentation is not the same thing as practicing spec-driven development. The distinction matters, because without it, SDD becomes a label teams apply to whatever they were already doing.

Four characteristics separate SDD from conventional planning artifacts:

The specification is the input to implementation, not a parallel artifact. The agent reads and executes against the spec. It is not documentation sitting next to the code — it is the source from which the code is derived. If the agent cannot consume the spec directly as context for its work, it is not functioning as an SDD specification.
The specification is authoritative. When the spec and the code disagree, the spec wins. Code gets regenerated or corrected to match the specification, not the other way around. This is the inversion that gives SDD its name: the specification drives development, rather than documenting it after the fact.
There is a structured workflow with review gates. Implementation does not go directly from idea to code in a single prompt. There is a progression — some variant of specify, plan, decompose, implement — with human review at the transitions. Each phase produces artifacts that the next phase consumes. This structure is what prevents the agent from generating plausible-looking output that misses the point.
Specifications are maintained as the system evolves. Changes go through the spec first, not directly to the code. When a requirement changes, the specification is updated and the implementation follows. This is the discipline that prevents spec drift — and it is also the discipline that is hardest to sustain in practice.

Without these four characteristics, a team has planning documents. With them, it has spec-driven development. The tooling landscape that follows is organized around how much support each tool provides for enforcing these characteristics.

Roots in Existing Practice

These four characteristics define what SDD is. They also explain why it did not emerge in a vacuum — the underlying idea has deep roots. Domain-Driven Design argued that a shared ubiquitous language between business and engineering is the foundation of good systems. Behavior-Driven Development formalized that language into executable specifications. Test-Driven Development established that writing the contract before the implementation produces better results. SDD extends these disciplines into an environment where the implementer is an AI agent rather than a human — and where the cost of a misunderstood requirement is amplified by the speed at which the agent can generate the wrong thing.

The relationship to existing practices is complementary, not competitive. BDD's "Given/When/Then" acceptance criteria are a natural specification format for SDD. DDD's bounded contexts and aggregate invariants become constitutional constraints. TDD's red-green-refactor cycle operates within the implementation phase, verifying that generated code meets the spec. What SDD adds is the recognition that with AI agents, the specification itself — not the test suite, not the code — is the primary artifact that defines system correctness.

Getting Started: The Markdown Spec

The simplest possible version of SDD is a markdown file in your repository. No tooling, no framework, no infrastructure. Just a document that describes what you are building before you ask an agent to build it.

Jesse Vincent's Superpowers framework demonstrates this approach at the individual developer level. Superpowers is a set of composable "skills" for coding agents that enforce a mandatory workflow: brainstorm the design, get human approval, write an implementation plan, then execute through subagents with test-driven development. The specifications are markdown files. The enforcement is prompt engineering. The insight is that agents perform dramatically better when given structured context rather than ad-hoc instructions. Over 85,000 GitHub stars suggest structured agent workflows resonate with developers, though stars indicate interest rather than production adoption.

For a solo developer or a small team, this is the right starting point. Write a spec before you write code. It does not need to be formal. It needs to be precise enough that an agent (or an enthusiastic junior engineer with no project context) could implement it without ambiguity.

A minimal spec-first workflow:

Describe what you want to build in a markdown file — user stories, acceptance criteria, constraints, edge cases.
Have the agent review the spec and ask clarifying questions before writing any code.
Break the spec into discrete tasks — small enough to implement, review, and verify independently.
Implement task by task, validating each against the spec before moving on.

This is the foundation. Everything that follows is about scaling this practice to larger teams, longer-lived systems, and more demanding environments.

The Tooling Landscape: Three Tiers of Structure

As teams grow and projects become more complex, unstructured markdown specs start to show cracks. Specs reference each other by filename but have no stable identity. Dependencies between specifications are implicit. Cross-referencing and querying specs requires manual effort or custom tooling. The specification-to-implementation alignment problem — the gap between what a spec says and what the code actually does — becomes harder to manage.

The SDD tooling ecosystem has grown rapidly. Rather than enumerate every option, it is more useful to understand the three tiers of structure they represent and the philosophical trade-off each embodies: how much ceremony is worth the governance it provides.

Tier 1: Lightweight Execution Frameworks

Tools like GSD (by TÂCHES) and Superpowers prioritize minimal ceremony and strong context management. GSD's core technical insight is that context rot — the quality degradation as an agent's context window fills — is the primary failure mode in AI-assisted development. Its response is to spawn fresh subagent instances per task, each with a clean 200,000-token context, so task fifty has the same quality baseline as task one. The philosophy is explicit: no sprint ceremonies, no story points, no enterprise roleplay. Tools at this tier typically install with a single command and integrate directly with CLI-based agent workflows.

These tools are optimized for individual developers and small teams who want structured spec-first workflows without organizational overhead. The trade-off is that they provide execution discipline without broader governance — there is no "constitution" concept, no cross-team coordination, no layered authority over what specifications can contain.

Tier 2: Team-Scale Specification Frameworks

GitHub Spec Kit (71,000+ stars, 20+ supported AI agents) and OpenSpec add structured workflows and organizational governance on top of the spec-first pattern.

Spec Kit implements a four-phase workflow — specify, plan, tasks, implement — and introduces the "constitution" concept: project-level constraints that govern all specification authoring and agent execution. The constitution captures organizational standards, architectural decisions, and domain patterns in a machine-readable format. This is the first step toward the layered governance that enterprise environments eventually require.

OpenSpec takes a brownfield-first approach, built on the premise that most real work happens on existing codebases rather than greenfield projects. It maintains a single evolving "source of truth" specification with delta markers (added, modified, removed) for proposed changes — what the project describes as "version control for intent."

BMAD goes further, organizing the full development lifecycle around 21 specialized AI agents with distinct roles — analyst, architect, product manager, developer — each governed by its own persona file and explicit role boundaries. It is the most governance-heavy option, well-suited to large greenfield initiatives where formal planning is expected.

These frameworks are genuinely competing approaches with overlapping sweet spots. Choosing between them depends on whether a team values specification breadth and portability (Spec Kit), brownfield change management (OpenSpec), or comprehensive lifecycle governance (BMAD). Some vendors, like Amazon's Kiro, are embedding this level of structure directly into the IDE rather than layering it on top — a bet that SDD adoption increases when the workflow is integrated into the tool developers already use.

The trade-off at this tier is real. As a Martin Fowler article by Birgitta Böckeler exploring several of these tools observes, most excel when requirements are clear upfront but can feel like a sledgehammer for a five-line bug fix. The tooling is still finding its appropriate granularity.

Tier 3: Specification Infrastructure

This tier is the least developed. Most specification frameworks treat specs as files — markdown, YAML, or other structured documents. This works well until specifications need to be queried programmatically across a large system, until dependencies between specs form graphs complex enough that manual tracking breaks down, or until multiple agents need to claim and execute work concurrently with conflict-free coordination. Teams can build tooling to parse and cross-reference markdown files — and many do — but at a certain scale, the effort to maintain that custom tooling becomes significant.

The problems that emerge at enterprise scale — stable addressability across specs, graph-queryable dependencies, multi-agent coordination, layered organizational governance — point toward treating specifications as structured, queryable data rather than documents. We explore these problems and one possible approach in the next post in this series.

When SDD Is the Wrong Approach

No methodology is universally correct, and SDD has real failure modes that deserve honest examination.

Over-specification. Spending two days writing a perfect specification for a feature that gets canceled next sprint is waste. SDD is most valuable for work that will survive long enough to benefit from the upfront investment — core domain logic, regulated workflows, or cross-team interfaces. For throwaway prototypes or exploratory spikes, a conversation with an agent is faster and more appropriate than a formal spec.
Spec drift. The classic problem with specification-first approaches is that the spec says one thing and the code does another, and eventually no one trusts the spec. SDD does not magically solve this. It reduces the risk by making specifications the input to code generation rather than a separate document maintained in parallel, but drift is still possible — especially when changes bypass the spec, whether manual edits to generated code or direct implementation of new requirements. Teams adopting SDD need a discipline of spec-first modification, and tooling that detects divergence.
Specification review as bottleneck. If review time is already up 91% (per the Faros data), adding a specification review phase before code generation increases total review burden. The bet is that catching problems at the spec level — before code exists — is cheaper than catching them in code review. This is plausible but not yet proven at scale. Teams should measure whether SDD reduces total review effort or merely shifts where the effort is incurred.
The granularity problem. Most SDD frameworks are optimized for medium-to-large features. A five-line bug fix, a CSS tweak, a dependency version bump — these do not need a multi-phase specification workflow and forcing them through one creates friction that slows teams down. Effective SDD adoption requires clear heuristics when a change is too small to justify the ceremony.
Training gap. Specification authorship is a skill. It is closer to requirements engineering and domain modeling than to coding. If senior engineers become spec authors, they need training and practice in that discipline — and the training materials, mentorship models, and career development paths do not yet exist in most organizations.

The Organizational Shift

Tooling is the easier half of the SDD transition. The harder half is organizational.

SDD shifts the specifier-to-implementer ratio. In traditional development, most effort goes to implementation rather than specification. SDD inverts this. The people who define correctness — domain experts, senior architects, staff engineers — become the critical path. Their time moves from code review and implementation oversight to specification authorship. This is a deliberate bet on the scarcest resource in most engineering organizations, and it carries real career implications. Junior engineers build skills through verification, test engineering, and working with existing systems — but if implementation is increasingly automated, how do they develop the taste and judgment that specification authorship requires? Senior engineers need to develop new skills in requirements formalization and domain modeling, disciplines that have historically been undervalued in engineering culture. The career paths, mentorship models, and training programs for this shift do not yet exist in most organizations and building them is as important as adopting the tooling.

For regulated industries — financial services, insurance, healthcare — SDD offers a structural advantage. Specifications that govern agent-generated code provide an audit trail at the level of business intent, mapped to regulatory requirements in a way that source code's implementation details often obscure. Positioning this investment as a compliance capability rather than a feature-delivery optimization changes both the budget conversation and the executive sponsorship model.

What We Do Not Yet Know

Beyond the organizational questions, several technical unknowns remain.

What does effective review of agent-generated code look like? AI-generated code fails differently than human-authored code — more often structurally correct but semantically wrong, locally coherent but globally inconsistent. The review discipline for these failure modes does not yet exist as a mature practice.
Does SDD reduce total cost, or just shift where cost is incurred? If specification authorship and review replace code review as the primary bottleneck, the net efficiency gain depends on whether problems are cheaper to catch at the spec level than at the code level. This is plausible but unproven at scale.
How much specification overhead is appropriate for a given change size? Most frameworks optimize for medium-to-large features and struggle with the small changes that constitute the majority of daily work. The tooling needs better granularity detection.
How fast will the underlying agent capabilities evolve, and how will that change what specifications need to contain? The constitutional foundation — organizational constraints, regulatory requirements, domain patterns — is likely durable. The tactical specification format will change as agents become more capable.

These are early days. The evidence base is thin, the tooling is immature, and the teams adopting SDD today are placing a bet — an informed one, but a bet nonetheless. What the evidence does suggest is that teams investing in specification discipline now, even starting with a markdown file in a repository, will be better positioned to absorb the next wave of agent capabilities than those who are not.

The first step is the same regardless of scale: write a spec before you write code.

Sean Brandt is a Technical Fellow at GEICO specializing in large-scale backend systems for financial services and insurance. This is the first post in a series on spec-driven development. The next installment examines what happens when file-based SDD meets enterprise-scale systems — and whether graph-based specification infrastructure is an answer.

Vehicle Insurance

Property Insurance

Business Insurance

Additional Insurance

My Account

Claims and Roadside Help

Tools and Resources

About GEICO

Web and Mobile