Intent Specification Layer Guide
Intent Specification Layer Guide
1. Purpose
AI-assisted coding has a recurring failure mode: the model implements what was said, then guesses everything that was not said.
The guesses often hide in places that are not obvious from a happy-path demo:
- identity and billing authority;
- canonical IDs and terms;
- bad input and stale state;
- retry, rollback, and partial failure;
- whether generated artifacts or human specs are the source of truth.
The Intent Specification Layer exists to reduce those guesses and expose the ones that remain. It is not a documentation archive. It is the implementation-governing layer of a repository.
docs/ explains.
spec/ governs.
If code and spec disagree, update the code or update the spec first. Do not leave them divergent.
Its essential value is not “more requirements.” It is a smaller, reviewable model of the user experience and system promises.
That model works in two directions:
- forward: intent -> plan -> implementation -> verification;
- reverse: spec review -> missing journey/state/failure found -> spec and code correction -> verification.
A good spec lets a reviewer understand the intended user journey before reading all the code. If the spec review reveals that a step, state, error, or next action is missing, that is a product defect until the spec and code are brought back into alignment.
2. Authority, Scope, And Evidence
The layer has one source-of-truth job:
Record the intended product and system promises, including behavior that has been decided but not implemented yet.
Do not remove an accepted future behavior from spec/ just because the current
code does not implement it. That would turn the spec into a snapshot of the
code, which is the opposite of its purpose.
Keep these categories separate:
| Category | Belongs in | Meaning |
|---|---|---|
| Authoritative intent | L0/L1/L2/L3 specs | What must be true for the product or system |
| Release scope | spec frontmatter or release planning notes | Which accepted promises are targeted by a release |
| Implementation evidence | tests, guardrails, code traces, manual evidence records | What has actually been verified |
| Review findings | spec/reviews/ ledger |
Candidate gaps, status, and resolution trail |
| Generated artifacts | generated/ or project-local build output |
Trace slots and derived reports, not authority |
Implementation status words such as missing, partial, ready, or
implemented should not appear as normative L1/L2/L3 behavior. Put them in a
review ledger or evidence matrix instead.
A finding becomes a release blocker only when all of these are true:
- the requirement is authoritative, not merely a proposal or sample;
- the requirement is in the target release scope;
- implementation evidence is missing or partial;
- the behavior is required for the release’s core user, operator, or system journey.
This prevents two common errors:
- treating every future spec requirement as a current release blocker;
- treating every code mismatch as a code defect before checking whether the spec itself has authority.
3. Values Provided
The layer provides six practical values:
- Guess reduction. It turns unstated assumptions into explicit requirements, terms, authorities, and contracts.
- Experience inspectability. It lets humans and AI review the user journey from a compact artifact before reading the whole implementation.
- Defect discovery. It reveals missing states, missing recovery paths, and implementation behavior that has no product justification.
- Change continuity. It preserves why behavior exists, not only what code currently does.
- Test trace generation. It turns each EARS statement into a generated test stub so verification work has a concrete slot.
- Tool independence. It gives Spec Kit, OpenSpec, Kiro, BMAD, Augment Intent, plan mode, and ordinary code review a shared source layer.
4. Source Layer Vs Tools
Spec-driven development tools are execution tools. The spec layer is the source layer they consume.
| Tool or method | Core purpose | Relationship to this layer |
|---|---|---|
| Spec Kit | spec -> plan -> tasks -> code | Can consume spec/ as source |
| OpenSpec | change deltas and archival | Inspires spec/changes/ |
| Kiro | IDE specs, design, tasks | Can read/write feature specs |
| BMAD | role-based planning depth | Useful for large planning, anchored back to spec/ |
| Augment Intent | living spec and drift reduction | Same goal, productized workflow |
| Plan Mode | implementation sequence | Runs after relevant spec is loaded |
Do not create a new root folder just because a tool has a preferred convention.
If a tool needs generated working files, keep them clearly separate or sync the
accepted result back into spec/.
5. Layer Model
| Layer | Purpose | Required when |
|---|---|---|
| L0 Constitution | Product-wide values, authorities, forbidden shortcuts | Always |
| L1 Domain Truth | Entities, states, vocabulary, invariants, ownership | Shared concepts or 2+ modules |
| L2 Behavior Spec | EARS requirements for system response | Every behavior change |
| L3 Interface Contract | Ordering, payloads, idempotency, rollback, partial failure | Multi-step or cross-resource mutation |
Fixed rules:
- L0 is always ambient context.
- L2 is required for every behavior-changing implementation.
- L1 is required when terms, states, or entities are shared across modules.
- L3 is required when partial failure, retry, rollback, deletion, payment, entitlement, external service calls, or idempotency matter.
6. Layer Selection
Use this decision order:
Always:
read L0
If behavior changes:
write or update L2
If two or more modules share nouns, states, IDs, ownership, or authority:
write or update L1
If the implementation mutates multiple resources, calls external services,
requires retry, supports deletion, charges money, grants entitlement, or needs
rollback/compensation:
write or update L3
Minimum viable spec by work type:
| Work type | Minimum layer set |
|---|---|
| Copy-only or docs-only change | L0 check; no feature spec unless behavior changes |
| Single isolated UI behavior | L0 + L2 |
| Shared model, state, permission, or workflow | L0 + L1 + L2 |
| Auth, billing, deletion, uploads, external API, async workflow | L0 + L1 + L2 + L3 |
| Bug caused by missing intent | Update the missing layer before or with the fix |
7. Structure Decision
There are two valid repository structures. Choose by domain shape.
Global Structure
Use when the product has one strongly unified domain model.
spec/
README.md
00_constitution.md
01_domain.md
02_behavior/
03_contracts/
changes/
schemas/
Feature Structure
Use when authority sources and change cadence differ by domain slice.
spec/
README.md
00_constitution.md
features/
account-access/spec.md
billing/spec.md
project-workflow/spec.md
changes/
templates/
schemas/
experiments/
8. L2 Behavior: EARS
Layer 2 uses EARS because it keeps natural language but forces behavior shape.
| Pattern | Use |
|---|---|
[Ubiquitous] |
Always true behavior or invariant |
[Event-driven] |
Response to an event |
[State-driven] |
Behavior while a state holds |
[Unwanted] |
Error, race, stale state, bad input, failure, abuse |
[Optional] |
Feature flag, entitlement, optional condition |
Writing rules:
- One line means one requirement.
- Every requirement gets a stable ID:
REQ-<AREA>-<NUMBER>. - Every feature spec needs at least one
[Unwanted]requirement. - Avoid vague capability phrasing such as “the user can”.
- Prefer “When/While/If …, the system shall …” so the line is testable.
- Split separate behaviors into separate lines.
Example:
- [REQ-AUTH-004][Unwanted] If backend account sync fails after central OAuth
succeeds, the system shall show a recoverable login error and shall not create
a partial local session.
8A. Spec Authoring Quality
EARS gives the sentence shape, but it does not by itself guarantee that the right situations were considered. Use Spec authoring quality before implementation.
Three checks matter most:
- Feature archetype packs. Async work, source ingestion, external AI,
approval, payment, auth, deletion, and external integration have predictable
failure surfaces. Select the matching packs and write the missing
[State-driven],[Unwanted], or L3 contract entries. - Valid input failure rule. If a user provides valid input and automation fails, preserve the input and provide a recoverable draft, still-processing state, retry path, or actionable error. Do not collapse valid input into an empty manual-only fallback.
- Latency / processing contract. Customer-visible work that can outlive the generic API timeout must choose synchronous, endpoint-specific long request, polling, background job, or streaming behavior.
If every REQ has exactly one EARS statement, pause and check for under-decomposition. Important customer-facing capabilities usually need more than one statement because state, recovery, and unwanted paths are distinct behaviors under the same capability.
9. L3 Interface Contracts
EARS says what must happen. L3 says how modules preserve the promise when the operation crosses a boundary.
Add L3 when implementation needs any of these:
- ordered calls;
- durable mutation of multiple resources;
- external service calls;
- retry;
- idempotency;
- rollback or compensation;
- partial failure reporting;
- deletion and audit behavior.
At minimum, an L3 contract should state:
- caller and callee;
- request and response;
- auth and authority source;
- call ordering;
- idempotency key or duplicate-call rule;
- partial failure behavior;
- rollback or compensation behavior;
- audit or observability signal when relevant.
Implementation hints are allowed. If exact rollback requires tracking
reserved_items, write that. This is not over-specification; it is the contract
that prevents the AI from guessing.
10. Lifecycle
| Status | Meaning |
|---|---|
draft |
Being explored; do not implement as authority yet |
review |
Ready for human or agent review before code |
adopted |
Accepted product or system promise; implementation may still be pending |
active |
Governs the target implementation for the current product line or release |
stable |
Mature and rarely changed, still authoritative |
deprecated |
Superseded but not removed yet |
archived |
Historical only; no longer governs implementation |
Changes that are not ready to merge into active specs live under
spec/changes/<YYYY-MM-DD-short-name>/. After implementation and verification,
merge accepted behavior into the relevant feature spec.
Lifecycle status is authority status, not implementation status. A spec can be
adopted while code is still missing. A spec can be active while individual
requirements are only partially verified. Use review ledgers and evidence
matrices to record that implementation state.
11. Traceability And Drift
Spec drift is the main failure mode.
Control rule:
A behavior-changing code change must either update the relevant spec or state why the spec remains unchanged.
Recommended traceability:
- Put stable IDs on EARS requirements.
- Reference requirement IDs from tests, pull requests, or comments when useful.
- Use
@Spec(REQ-...)sparingly on complex code paths where future agents need a direct breadcrumb. - Add a test, guardrail, smoke check, or manual verification note for each new behavior requirement.
Generated artifacts such as OpenAPI JSON or generated frontend types are generated contracts, not the source of product intent.
12. REQ-ID To Test Bridge
Every L2 requirement should be machine-extractable and convertible into a test stub. Multi-statement requirements should be tracked at statement level. This is the minimum bridge from intent to verification.
The bridge is:
REQ-ID in spec -> statement ID -> generated manifest -> generated test stub ->
mapped evidence -> non-generated trace -> executed verification evidence
Generated skipped tests are not proof that the system works. They are visible
work slots. The final proof is a real implementation test, guardrail, smoke
check, or manual verification note that references the same REQ-ID or statement
ID and has run or been recorded. A non-generated @Spec(...) trace is useful
for navigation, but it is not enough by itself to mark a statement verified.
Rules:
- Every EARS requirement gets a stable
REQ-...ID. - Every REQ-ID or statement ID appears in the Verification Map. For multi-statement requirements, use statement IDs rather than only the parent REQ-ID.
npm run req:test:generateupdates generated requirement artifacts.npm run check:reqsfails when generated artifacts are missing or stale.generated/verification-report.mdseparates generated-only slots from non-generated@Spec(...)references.- Unknown
@Spec(...)references fail the generated artifact check. - Review should ask whether each generated stub has a mapped evidence path, a non-generated trace where practical, and executed evidence.
13. Spec Review As Defect Discovery
The spec is also a review tool.
Reviewers should be able to read a feature spec and answer:
- what journey the user is supposed to experience;
- what state the system starts in and ends in;
- what the user sees after each important action;
- what can fail;
- what the user can do after failure;
- which authority owns identity, payment, entitlement, deletion, or other shared state;
- what must not happen.
If those answers are missing, write the missing intent into L1, L2, or L3. Then fix the code to satisfy it. If the answers are present in the spec but missing from the implementation, fix the code and tests. If the code contains behavior that the spec does not justify, either add the missing spec or remove the behavior.
This reverse loop is as important as pre-implementation planning:
spec review -> missing UX/system promise found -> update spec -> update code -> verify
Use it when a flow feels wrong, when a bug reveals an unstated assumption, or when an AI-generated implementation technically works but does not guide the user toward the next step.
Spec-only review can identify candidate gaps, but it is not implementation
evidence. Mark those findings implementation_status=unverified until code,
tests, screenshots, runtime behavior, or manual evidence has been checked.
Then classify the implementation as missing, partial, implemented, or
not_applicable.
14. Agent Workflow
Agents should start from AGENTS.md when it exists. That file is the compact
operating contract; this guide is the deeper reference.
1. Read L0.
2. Locate the relevant feature spec.
3. Update or create L1 terms when shared vocabulary changes.
4. Write L2 EARS requirements for behavior changes.
5. Add L3 only when cross-module or partial-failure semantics matter.
6. Use plan mode or another tool to create the implementation plan.
7. Implement.
8. Generate or refresh requirement statement test stubs.
9. Replace or complement generated stubs with real verification.
10. Verify each relevant requirement.
11. Update spec first when a bug reveals missing intent.
Do not start with implementation planning when the governing behavior is still
implicit. Plan mode answers “what steps should I take”; spec/ answers “what
truth must those steps preserve.”
15. Anti-Patterns
- A PRD that describes value but omits bad states.
- A behavior change with no EARS requirement.
- A REQ-ID that does not generate a test stub.
- A generated skipped test treated as completed validation.
- A spec that cannot reconstruct the user journey.
- A spec-only finding labeled as “not implemented” before code evidence is checked.
- An accepted future behavior removed from
spec/because the current code has not caught up yet. - Implementation readiness labels such as
ready,missing, orpartialwritten into normative L1/L2/L3 behavior instead of a review ledger or evidence matrix. - A proposal-only requirement treated as a current release blocker without checking target release scope.
- A common-sense edge case promoted into binding spec without an authority basis from L0, L1, product decision, platform rule, or explicit review.
- An error state with no user next action.
- A shared ID with multiple aliases across modules.
- A multi-resource mutation with no rollback or idempotency rule.
- A design snapshot treated as implementation authority without migration into
spec/. - A tool-generated plan that becomes the source of truth while
spec/stays stale. - A passing happy-path smoke test used as proof that the contract is complete.
- A code path that exists only because the AI invented a plausible behavior.
16. Final Operating Principles
docs/explains;spec/governs.- L0 always exists.
- Behavior changes require L2.
- Shared terms require L1.
- Partial failure, rollback, retry, deletion, payment, entitlement, or idempotency requires L3.
- Use global structure for a unified domain.
- Use feature structure for authority-split domains.
- Tools consume the source layer; they do not define it.
- Code and spec must not drift silently.
- The spec must be reviewable as a model of the intended user journey.
- Accepted future behavior belongs in
spec/; implementation status belongs in evidence or review artifacts. - REQ-IDs must generate test stubs and map to verification, but generated stubs must not be counted as proof of behavior.
- Non-generated
@Spec(...)traces are evidence links, not final proof, until their associated verification has run or been recorded. - Spec-only gap findings start as implementation-unverified until code evidence is reviewed.
- The goal is not more documents. The goal is fewer AI guesses and faster discovery of missing intent.