Engineering review evidence

Evidence Reports show whether an AI coding-agent run is ready to trust.

An Evidence Report is the evidence bundle produced by a coding agent before a human approves branch push, PR creation, merge proposal, deployment, spend increase, escalation, or safe-stop.

Operational definition

Evidence Report = the review record for governed engineering-agent runs.

It answers the questions owners and reviewers care about: what was requested, which agent acted, under what authority, what changed, what passed, what failed, what risks remain, how to stop safely, what it cost, and who must approve the next step.

Owner review

The report turns agent output into a decision a human can govern.

AgentFoundry does not ask teams to trust a black box. The report forces the evidence needed to approve PR creation, deploy, retry, narrow, or stop the agent.

?What exact engineering workflow did the agent run?
?Which repos, files, tools, systems, or sources did it touch?
?What evidence proves the output is useful enough?
?Where did it fail or require human judgment?
?What is the safe PR, deploy, retry, safe-stop, or stop decision?
review-report/

Objective

The requirement, owner, target repo/service, engineering outcome, success criteria, allowed systems, and approval boundary.

What was requestedCaptured by default
Open sample review JSON
Report contents

Requirement, WorkGraph, records, checks, changed files, approvals, risks, safe-stop, handoff.

This compresses execution evidence, test results, audit trail, execution records, policy state, review notes, cost, and safe-stop planning into one reviewable report.

ObjectiveThe requirement, owner, target repo/service, engineering outcome, success criteria, allowed systems, and approval boundary.
PlanThe WorkGraph tasks, dependencies, selected agent lanes, tool permissions, expected outputs, and safe-action anchors.
TraceStep-by-step agent actions, messages, tool calls, commands, sources, durations, outputs, timestamps, and replay notes.
Changed filesPatch diffs, tests, docs, scripts, config changes, screenshots, logs, or other concrete engineering deliverables created by the run.
VerificationLint, typecheck, tests, evals, policy checks, regression checks, build checks, failure evidence, and reviewer notes.
ApprovalsHuman gates for external, destructive, production, deployment, merge, spend, or policy-sensitive actions.
RisksUncertainty, missing context, unresolved failures, misuse risk, quality gaps, cost risk, and required human review before merge or release use.
Safe-stopSafe revert path, affected systems, production caution flags, abandoned paths, stop controls, and what should stay human-owned.
RecommendationsApprove PR, request changes, retry, route, escalate, narrow scope, improve checks, adjust policy, or keep the decision human-owned.
HandoffA compact engineering summary a busy owner can accept, reject, route, merge, deploy, audit, or turn into the next run.
Run loop
01Plan

Describe the issue, pick an engineering agent or blank agent, and define the needed outcome in plain language.

02Configure

Set repo, tools, data, memory, escalation rules, allowed actions, checks, and review requirements.

03Run

Launch the agent in a managed, private, or customer-controlled engineering environment.

04Verify

Track tasks, commands, diffs, checks, approvals, failures, and escalations.

05Approve

Approve sensitive steps, adjust permissions, stop, retry, route, or revert.

06Improve

Improve templates, promote safe behaviors, keep ownership clear, and reuse proven engineering agents.