Agent Operating Protocol
Agent Operating Protocol
Use this protocol when an AI agent applies the Intent Specification Layer to a real repository.
The goal is to keep the agent from forgetting the method’s central distinction:
spec = authoritative intent
evidence = implementation proof
ledger = review and readiness trail
generated = trace scaffolding
First route the request with Agent Mode Router. Then use this protocol for the selected mode.
0. Spec Standard Reflex
Before comparing code to a requirement, treat accepted L1/L2/L3 specs as the product standard, not a snapshot of current implementation.
Do not downgrade accepted specs to match incomplete code. If reviewed implementation evidence does not satisfy an accepted requirement, keep the spec and classify the mismatch as an evidence gap:
| Evidence result | Classification |
|---|---|
| No implementation found | missing_implementation |
| Some branches implemented, others absent | partial_implementation |
| Behavior appears in code but no executed proof exists | missing_test |
| Code contradicts authoritative in-scope spec | wrong_code |
| Spec lacks authority or conflicts with stronger authority | wrong_spec |
| Correct behavior is unknowable | decision_gap |
Only update the spec downward when the standard itself is wrong, unauthoritative, out of scope, stale, or contradicted by stronger L0/L1/platform authority.
1. Start With The Task Mode
Classify the request before reading implementation code.
| Mode | Trigger | First move |
|---|---|---|
| Spec authoring | New behavior or changed product promise | Write or update L1/L2/L3. |
| Implementation | Build behavior from accepted spec | Load governing REQ IDs and write the verification obligation before code. |
| Reverse review | Flow feels wrong or code already exists | Review spec as user journey first. |
| Evidence mapping | Tests or CI requested | Map REQ IDs to real tests, guardrails, runtime, or manual evidence. |
| Release audit | “Ready?”, “blocker?”, “launch?” | Check authority, release scope, implementation evidence, and core journey impact. |
| Method update | New ILS version, upstream rule, or template applied to an existing repo | Install governance, inventory specs, run propagation audit, and report residual gaps. |
If the mode is unclear, default to reverse review. It exposes missing intent without prematurely accusing the implementation.
2. Apply Layer Selection
Use the minimum sufficient layer set:
| Condition | Required layer |
|---|---|
| Product-wide value, authority, or forbidden shortcut | L0 |
| Shared term, state, ID, ownership, or invariant | L1 |
| Any behavior change | L2 |
| External service, async boundary, deletion, money, entitlement, retry, rollback, idempotency, or partial failure | L3 |
Do not skip L3 for “small” flows that cross these boundaries. Small flows can still create irreversible or inconsistent states.
3. Preserve Authority/Evidence Boundary
Before saying a behavior is missing, ask:
- Is the spec requirement authoritative or only a draft, proposal, sample, or stale import?
- Is the requirement in the reviewed release scope?
- Has code, test, runtime, design, or manual evidence been checked?
- Does the behavior affect the core user, operator, or system journey?
Only then classify:
| Classification | Use when |
|---|---|
| Spec gap | Product authority or code has reasonable behavior that spec does not explain, or spec imported a promise with no authority. |
| Code gap | Authoritative spec promise is not satisfied by reviewed implementation evidence. |
| Both gap | Neither spec nor code handles a real journey, state, failure, or recovery path. |
| Edge-case gap | A likely edge case was found but needs authority before becoming binding. |
| Decision gap | Correct behavior is not knowable from current authority. |
For implementation audits, prefer the sharper taxonomy from
Spec As Product Standard:
missing_implementation, partial_implementation, missing_edge_case,
missing_test, wrong_spec, wrong_code, or decision_gap.
4. Edge-Case Discovery Loop
For every action, run this prompt set:
duplicate submit?
stale state?
wrong actor or ownership?
external success + local failure?
local success + downstream failure?
pending too long?
generic timeout exceeded by normal processing?
valid input failure after extraction, analysis, generation, or automation?
empty manual-only fallback after valid input?
cancel, reject, retry, expire?
user-visible next action?
Then decide the authority basis:
| Authority basis | Action |
|---|---|
| L0 value or L1 invariant | Promote to L2/L3 unless explicitly rejected. |
| Security, privacy, money, deletion, ownership, or data integrity | Promote to L2/L3 and require verification. |
| Product decision | Promote to L2/L3 and map evidence. |
| Common UX expectation | Record candidate; accept or reject explicitly. |
| Sample or previous project import | Re-authorize before treating as binding. |
4A. Feature Archetype Packs
Before L2 authoring or implementation, choose the relevant feature archetype packs from Spec authoring quality. The packs are there to make predictable failure surfaces hard to forget:
- async customer operation;
- source or file ingestion;
- external AI or automation;
- approval or decision;
- payment, entitlement, or billing;
- auth or account;
- deletion or privacy;
- external integration.
For each selected pack, make sure the spec contains the corresponding state,
[Unwanted], or L3 contract.
The two checks most likely to prevent false completion are:
- valid input failure: accepted user input is preserved when automation fails, with draft, still-processing, retry, or actionable error recovery;
- latency contract: work that can outlive the generic timeout declares synchronous, long request, polling, background job, or streaming behavior.
4B. Method Update Propagation
A method update is not complete when governance files are updated. Governance install is only the first step.
Use Method Update Propagation for this mode:
upstream rule/version
-> governance install
-> authoritative spec inventory
-> feature-spec propagation audit
-> accepted L1/L2/L3 edits
-> generated artifacts and checks
-> residual gap ledger
The agent must not report complete unless every in-scope authoritative spec
was inventoried, reviewed under the new rule, updated or explicitly excluded,
and the verification bridge was regenerated.
If feature specs remain unreviewed, report partial and list
pending_spec_review entries.
5. Release Blocker Test
Do not call a finding a blocker unless all are true:
authoritative requirement
+ current target release
+ missing or partial implementation evidence
+ required for core journey
= release blocker
If any condition is unknown, the release impact is unknown, not blocker.
6. Verification Discipline
Generated stubs are required but insufficient.
REQ-ID -> statement ID -> generated stub -> mapped evidence -> non-generated trace -> executed evidence
Real evidence can be:
- unit, integration, API, or UI test;
- guardrail or static check;
- smoke test;
- screenshot or runtime review;
- named manual UX evidence when automation is impractical.
Use statement IDs such as REQ-AUTH-004:S1 for multi-statement requirements.
Do not report a statement as verified only because:
- it appears in the Verification Map;
generated/requirements.test.*contains a skipped placeholder;- a non-generated
@Spec(...)comment exists.
Those are useful intermediate states. The final state requires an executed test or guardrail, a passing smoke check, or a named manual UX/runtime record with reviewer, date, scope, and artifact.
When filling a Verification Map or task brief, use this status language:
| Status | Meaning |
|---|---|
generated_stub |
Placeholder exists; no proof yet. |
mapped |
Intended evidence target is named. |
traced |
Non-generated code, test, guardrail, or manual note references the statement. |
verified |
Evidence ran or was manually recorded and satisfies the statement. |
manual_only |
Manual review is the accepted evidence for this statement. |
blocked |
Evidence path is known but cannot currently run. |
7. Implementation Verification Reflex
When the mode is Implementation, tests are not a later cleanup task. Treat each touched statement as an obligation that must be closed or explicitly left open.
Before editing application code:
- Copy the governing
REQ-...orREQ-...:SxIDs into the task notes. - Mark which statements are already verified and which need new evidence.
- Choose the evidence type for each changed statement: unit, integration, API, UI, guardrail, smoke, or manual UX/runtime record.
- If no practical verification path exists, mark the statement
blockedormanual_onlywith a reason before implementation continues.
During implementation:
- write or update the test/guardrail in the same change as the code;
- include the statement ID in the non-generated test, guardrail, or manual evidence record where practical;
- keep generated stubs as visible backlog only, never as completion evidence.
Before reporting done:
- run the relevant verification commands;
- list any touched statement still at
generated_stub,mapped, ortraced; - do not call the implementation complete if an authoritative in-scope statement lacks executed or recorded evidence.
8. Agent Report Template
Use this shape in the final report for non-trivial work:
Spec impact:
- L1:
- L2:
- L3:
- REQ IDs:
Evidence boundary:
- Spec reviewed:
- Code/test/runtime evidence reviewed:
- Unverified:
Edge cases:
- Accepted into spec:
- Candidate only:
- Rejected/non-goal:
Release impact:
- blocker / non_blocker / proposal_only / not_applicable / unknown
- Reason:
Verification:
- Commands:
- Generated stubs:
- Mapped evidence:
- Non-generated traces:
- Real evidence:
- Still unverified:
Method update propagation:
- Upstream rule/version:
- Specs reviewed:
- Specs excluded:
- Specs still pending:
- Residual gaps:
- Completion: complete / partial
Completion claim:
- Status: complete / partial / blocked / unverified / manual_only
- Mode-specific rule satisfied: yes / no
- Remaining work: