docs: add ADR-001 for verification output parsing (CG-62)
This commit is contained in:
91
project/decisions/ADR-001-verification-output-parsing.md
Normal file
91
project/decisions/ADR-001-verification-output-parsing.md
Normal file
@@ -0,0 +1,91 @@
|
||||
# ADR-001: Verification Output Parsing for Merge Gating
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2025-12-11
|
||||
**Issue:** CG-62
|
||||
|
||||
## Context
|
||||
|
||||
The Agent Runner system automates the development workflow, including code verification and merging feature branches to main. The QA Agent performs verification and reports pass/fail status in its output.
|
||||
|
||||
A critical bug was discovered where the QA Agent merged branch `issue/CG-19` to main despite reporting "Verification Failed" in its output. The root cause: the merge decision relied solely on the process exit code (0 = success), but the QA Agent could exit with code 0 while still reporting verification failure in its stdout.
|
||||
|
||||
### Evidence from CG-19
|
||||
|
||||
```
|
||||
Comment #5 (QA Agent Verification):
|
||||
> ## Verification Failed
|
||||
> The developer's comments claim comprehensive implementation but NONE of the
|
||||
> retry logic changes were actually implemented.
|
||||
|
||||
Comment #6 (QA Agent Merge) - 3 seconds later:
|
||||
> ## Branch Merged
|
||||
> Feature branch has been merged to main and deleted.
|
||||
```
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a secondary verification check that parses agent stdout for explicit pass/fail markers, independent of exit code.
|
||||
|
||||
### Implementation
|
||||
|
||||
1. **Added `_detect_verification_result(stdout)` function** that:
|
||||
- Scans stdout for failure patterns (case-insensitive regex)
|
||||
- Returns `(passed: bool, reason: str)` tuple
|
||||
- Falls back to pass if no markers found (preserves exit-code-only behavior for agents without markers)
|
||||
|
||||
2. **Failure patterns detected:**
|
||||
- "verification failed"
|
||||
- "requirements not met"
|
||||
- "was not implemented" / "were not implemented"
|
||||
- "none of the ... changes were implemented"
|
||||
- "acceptance criteria not met"
|
||||
- Markdown header "## Verification Failed"
|
||||
|
||||
3. **Pass patterns detected:**
|
||||
- "verification passed" / "verification succeeded"
|
||||
- "all acceptance criteria met"
|
||||
- Markdown header "## Verification Passed"
|
||||
|
||||
4. **Modified merge logic:**
|
||||
```python
|
||||
if task.task_type == "verification":
|
||||
stdout_passed, detection_reason = _detect_verification_result(task.stdout or "")
|
||||
logger.info(f"Verification result for {task.issue_id}: exit_code=0, stdout_check={stdout_passed} ({detection_reason})")
|
||||
|
||||
if stdout_passed:
|
||||
self._merge_feature_branch(task)
|
||||
else:
|
||||
# Block merge, add comment, move to Triage
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Prevents unverified merges:** Failed verification now correctly blocks merge regardless of exit code
|
||||
- **Audit trail:** All merge decisions are logged with the detection reason
|
||||
- **User feedback:** Blocked merges add a "Merge Blocked" comment explaining why
|
||||
- **Backwards compatible:** Empty stdout or no markers falls back to exit code behavior
|
||||
|
||||
### Negative
|
||||
|
||||
- **Pattern maintenance:** New failure message formats may need to be added to the patterns list
|
||||
- **False positives possible:** If a passing verification mentions failure in a different context (e.g., "Fixed the verification failed issue"), it could be incorrectly blocked
|
||||
|
||||
### Neutral
|
||||
|
||||
- Issues blocked from merge are moved to Triage state for manual review
|
||||
- The original failure pattern from CG-19 ("none of the ... changes were actually implemented") is specifically covered
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1. **Fix agent to always exit non-zero on failure:** Would require changes to all agent implementations and doesn't protect against agent bugs
|
||||
2. **Structured output format (JSON):** More robust but requires agent rewrite; stdout parsing is simpler and works with existing agents
|
||||
3. **Require explicit pass marker:** Would break backwards compatibility with agents that don't emit markers
|
||||
|
||||
## Related
|
||||
|
||||
- CG-62: Issue where this fix was implemented
|
||||
- CG-19: Issue where the bug originally manifested
|
||||
- `/opt/repos/agentrunner/runner.py`: Implementation location (lines 170-220, 429-450)
|
||||
Reference in New Issue
Block a user