Skip to Main Content
Talk Beginner MIT First Talk

Specs That Don't Drift: Verifiable Spec-Driven Development with AI

Approved
Shantanu Agarwal
Shantanu Agarwal
Session Description

"Coding is largely solved": we have all heard it. What that claim quietly skips is the part where someone has to define what to build. In practice, every serious AI-driven development workflow depends on specifications. The industry's bet is that natural-language specs are good enough to drive autonomous implementation. For a one-shot script or a toy demo, they are.

They stop being good enough the moment a system has more than a handful of features. Prose drifts. A requirement gets added in one document while the API design, the test plan, and the delivery roadmap are never updated to reflect it. The LLM has no way to know this. It faithfully implements a spec that is quietly contradicting itself, and the gap surfaces in production or in a broken integration test three sprints later. The root cause is not the LLM: it is that natural language gives tooling nothing to verify.

This talk argues for a different foundation. Instead of prose, you represent each dimension of a software system as a structured JSON artifact. A functional requirement is not a paragraph in a markdown file; it is a typed record with an ID, acceptance criteria, and explicit links to the capabilities it serves. An API endpoint is not a sentence in a readme; it is a record that traces back to the exact requirement it fulfills. A milestone plan traces to the roadmap. Test fixtures trace to functional requirements. Every claim is a data point that a tool can check.

The consequence is significant. A validator can cross two artifacts and report: this requirement has no corresponding API coverage; this test fixture references an endpoint that does not exist; this milestone covers features that were never formally required. These are not linter-style warnings about formatting. They are logical contradictions, caught before a single line of application code is written.

The talk walks through the reasoning behind this design: why structured artifacts are worth the upfront cost, what the traceability chain from requirements to APIs to tests to milestones looks like, and where the approach pays off the most. The session closes with an honest assessment of the tradeoffs (the waterfall discipline this requires is deliberate and is genuinely the hardest part) and a pointer to devspec_toolkit, an open-source MIT-licensed reference implementation of the full pipeline.

The hard part of AI-driven software development is not getting an LLM to write code. It is giving the LLM a spec it cannot misinterpret. This talk is about how to build that spec.

Key Takeaways

Why prose specs drift and why it matters.

Natural-language specs cannot be machine-verified. As a system grows across multiple documents, requirements, APIs, tests, and roadmaps silently fall out of agreement. The LLM never knows; it implements whatever it reads. The failure is structural, not a model quality issue.

How structured JSON turns a spec into a verifiable artifact.

When each dimension of a system (requirements, interfaces, test fixtures, milestones) is a typed JSON record with explicit cross-references, a validator can check whether the full set of artifacts is logically consistent, not just individually well-formed but mutually coherent.

Spec-time contradiction detection catches a whole class of bugs that code review misses.

A requirement with no API coverage, a test fixture referencing an undefined endpoint, a milestone scoped to features that were never formally required: these are caught by crossing artifacts, not by reading any single document. Finding them before implementation starts changes the cost profile entirely.

A closed canonical vocabulary eliminates hallucinated IDs.

Every ID in every artifact is registered. Any cross-artifact reference to an ID that was never defined fails validation immediately. This constraint, applied at spec-time, is what makes autonomous LLM implementation tractable: the model cannot invent references that the rest of the spec does not recognize.

References

Session Categories

Introducing a FOSS project or a new version of a popular project
Technology architecture
Engineering practice - productivity, debugging
Talk License: MIT

Speakers

Reviews

https://github.com/github/spec-kit Why not talk about this?

Reviewer #1 Not Sure

I don’t see a clear FOSS angle here, unsure how this will be relevant to a FOSS event. Please go through the talk proposal guidelines here - https://forum.fossunited.org/t/talk-proposal-guidelines-for-a-foss-conference-meetup/1923

Reviewer #2 Rejected