All posts
Engineering

AI Agent Code Review: PR Workflow Setup Guide for 2026

DevOS Platform TeamJune 13, 202612 min read

The PR sat open for three days.

Not because anyone disagreed with the changes — they were fine, maybe even good. But the senior dev who owned that part of the codebase was heads-down on a deadline. The other reviewers had context on the frontend, not the API layer. So the PR just... sat.

Three days. Meanwhile, the feature branch drifted further from main. Merge conflicts accumulated. The author context-switched to something else and half-forgot what they'd written.

Sound familiar? Yeah. Me too. More times than I'd like to admit.

What if an AI agent handled the first pass? Not replacing human review — that's a recipe for disaster, and honestly I've seen teams try it and regret it within a month — but doing the tedious mechanical checks so humans can focus on architecture, business logic, and the stuff that actually requires judgment. This is the core philosophy behind DevOS's agent-as-teammate model.

That's what we're setting up here. By the end, you'll have an AI agent wired into your PR workflow as an automated first-pass reviewer. It'll run a checklist, flag issues, request changes when needed, and approve when everything looks clean. Humans make the final call. The agent handles the grunt work. (It won't complain about being assigned to review on a Friday afternoon, either.)

Prerequisites

You'll need:

  • A GitHub repo with pull request workflows (we're adding to your existing setup)
  • An Anthropic API key with credits loaded — $10 covers months of PR reviews for most teams
  • GitHub Actions enabled on the repo
  • A GH_PAT (personal access token) with repo scope for commenting on PRs
  • Basic YAML familiarity for GitHub Actions

Optional but helpful: an existing linter/formatter setup. The AI works better when it's not duplicating what ESLint or Prettier already catches. (I wasted a solid week debugging "AI keeps flagging semicolons" before realizing we should just... run Prettier first. Not my finest moment.) If you're tracking CI costs and workflow performance, JustAnalytics can surface these metrics across your pipelines.

What We're Building

The flow looks like this:

  1. Developer opens a PR (or pushes to an existing one)
  2. GitHub Actions workflow triggers
  3. Workflow extracts the diff, changed files, and PR description
  4. Sends context to Claude with a structured review checklist
  5. Claude returns findings in a parseable format
  6. Workflow posts a review comment with findings
  7. If blockers exist, requests changes. If clean, approves.
  8. Human reviewer sees the AI review, makes final decision

The agent doesn't merge. It doesn't bypass your branch protections. It's a first-pass filter that makes human review faster and more focused.

That's it. Nothing fancy. But it works.

Step 1: Create the Review Checklist

Before writing any code, define what the agent should check. This is your review contract — the things you care about on every PR.

Create .github/review-checklist.md:

## AI Code Review Checklist

### Must Pass (blockers)
- [ ] No hardcoded secrets, API keys, or credentials
- [ ] No console.log/print statements left in production code
- [ ] All new functions have error handling for failure cases
- [ ] Database queries use parameterized inputs (no SQL injection risk)
- [ ] New dependencies are justified in PR description

### Should Pass (warnings)
- [ ] New public functions have JSDoc/docstring comments
- [ ] Test coverage exists for new logic branches
- [ ] No TODO/FIXME comments without linked issue
- [ ] Variable names are descriptive (no single letters except loop indices)
- [ ] No duplicated code blocks over 10 lines

### Nice to Have (suggestions)
- [ ] Complex logic has inline comments explaining why
- [ ] PR description explains the "why", not just the "what"
- [ ] Related documentation updated if behavior changes

This checklist lives in your repo. The AI agent reads it. When requirements change, you update the file — no workflow changes needed.

Step 2: Create the Review Workflow

Create .github/workflows/ai-code-review.yml:

name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, ready_for_review]

jobs:
  ai-review:
    if: github.event.pull_request.draft == false
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Get PR diff
        id: diff
        run: |
          # Get the diff between base and head
          git diff origin/${{ github.base_ref }}...HEAD > /tmp/pr.diff

          # Get list of changed files
          git diff origin/${{ github.base_ref }}...HEAD --name-only > /tmp/changed-files.txt

          # Count lines changed
          LINES=$(wc -l < /tmp/pr.diff)
          echo "lines_changed=$LINES" >> $GITHUB_OUTPUT

      - name: Skip large PRs
        if: steps.diff.outputs.lines_changed > 2000
        run: |
          echo "PR too large for AI review (${{ steps.diff.outputs.lines_changed }} lines). Skipping."
          exit 0

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Run AI review
        id: review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          PR_BODY: ${{ github.event.pull_request.body }}
        run: |
          npm install @anthropic-ai/sdk
          node .github/scripts/ai-review.js

      - name: Post review comment
        env:
          GH_TOKEN: ${{ secrets.GH_PAT }}
        run: |
          REVIEW_ACTION=$(cat /tmp/review-action.txt)
          REVIEW_BODY=$(cat /tmp/review-body.md)

          if [ "$REVIEW_ACTION" = "REQUEST_CHANGES" ]; then
            gh pr review ${{ github.event.pull_request.number }} \
              --request-changes \
              --body "$REVIEW_BODY"
          elif [ "$REVIEW_ACTION" = "APPROVE" ]; then
            gh pr review ${{ github.event.pull_request.number }} \
              --approve \
              --body "$REVIEW_BODY"
          else
            gh pr review ${{ github.event.pull_request.number }} \
              --comment \
              --body "$REVIEW_BODY"
          fi

A few things worth noting.

We skip draft PRs — no point reviewing work-in-progress. We skip PRs over 2,000 lines because the context window gets expensive and the review quality drops. (If your team regularly ships 2,000-line PRs, that's a different problem. Fix that first.) And we use a separate PAT because the default GITHUB_TOKEN can't post reviews on its own PR trigger. I spent an hour debugging that one. The error message is utterly unhelpful.

Step 3: The AI Review Script

Create .github/scripts/ai-review.js:

const Anthropic = require("@anthropic-ai/sdk");
const fs = require("fs");

const client = new Anthropic();

async function runReview() {
  const diff = fs.readFileSync("/tmp/pr.diff", "utf8");
  const changedFiles = fs.readFileSync("/tmp/changed-files.txt", "utf8");
  const checklist = fs.readFileSync(".github/review-checklist.md", "utf8");
  const prBody = process.env.PR_BODY || "(No PR description provided)";

  // Read changed file contents (limit to 10 files, 500 lines each)
  const fileContents = changedFiles
    .split("\n")
    .filter(Boolean)
    .slice(0, 10)
    .map((f) => {
      try {
        const content = fs.readFileSync(f, "utf8");
        const lines = content.split("\n").slice(0, 500).join("\n");
        return `--- ${f} ---\n${lines}`;
      } catch {
        return null;
      }
    })
    .filter(Boolean)
    .join("\n\n");

  const prompt = `You are a senior engineer conducting a code review. Review this PR against the checklist provided.

## PR Description
${prBody}

## Review Checklist
${checklist}

## Changed Files
${changedFiles}

## Diff
${diff.slice(0, 50000)}

## File Contents (for context)
${fileContents.slice(0, 30000)}

## Instructions
1. Evaluate each checklist item against the PR
2. For BLOCKERS: These must pass or you request changes
3. For WARNINGS: Flag them but don't block
4. For SUGGESTIONS: Mention if helpful, skip if not

Output format (follow exactly):
ACTION: [APPROVE|REQUEST_CHANGES|COMMENT]
---REVIEW---
## AI Code Review

### Summary
[1-2 sentence summary of the PR and overall assessment]

### Blockers
[List any blocking issues, or "None found"]

### Warnings
[List any warnings, or "None"]

### Suggestions
[List any suggestions, or "Looks good"]

### Checklist Status
[For each relevant checklist item, mark PASS/FAIL/N/A]
---END---`;

  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2000,
    messages: [{ role: "user", content: prompt }],
  });

  const text = response.content[0].text;

  // Parse response
  const actionMatch = text.match(/ACTION:\s*(APPROVE|REQUEST_CHANGES|COMMENT)/);
  const reviewMatch = text.match(/---REVIEW---([\s\S]*?)---END---/);

  const action = actionMatch?.[1] || "COMMENT";
  const reviewBody = reviewMatch?.[1]?.trim() || "Review could not be parsed.";

  fs.writeFileSync("/tmp/review-action.txt", action);
  fs.writeFileSync("/tmp/review-body.md", reviewBody);

  console.log(`Review action: ${action}`);
  console.log(`Tokens used: ${response.usage.input_tokens} in, ${response.usage.output_tokens} out`);
}

runReview().catch((err) => {
  console.error("Review failed:", err);
  fs.writeFileSync("/tmp/review-action.txt", "COMMENT");
  fs.writeFileSync("/tmp/review-body.md", "AI review encountered an error. Please review manually.");
});

The script reads your checklist, the PR diff, and file contents — then sends everything to Claude. The structured output format means we can reliably parse the response and take the right action.

Step 4: AI Code Review Approval Gates and Human Override

Here's where teams mess up: they give the AI too much power.

I get it. The AI is fast. It doesn't take PTO. It doesn't get annoyed when you push 47 commits in one day. But trusting it with final merge authority? Bad idea. Still.

The AI should never be the final approval for merging to main. Set your branch protection rules to require at least one human approval even if the AI approves. The AI's approval is a signal, not a gate.

In your repo settings (Settings → Branches → Branch protection rules):

  • Require 1 approval before merging
  • Don't count the AI bot as a "code owner" review
  • Keep "Require review from code owners" if you use CODEOWNERS

This way, the AI can approve — saving the human reviewer time on mechanical checks — but a human still has to click the final button.

When Humans Should Override

The AI review is a starting point. Humans override in these situations:

Context the AI can't see. The AI doesn't know your roadmap, your tech debt strategy, or that you're intentionally shipping a quick fix before a proper refactor next sprint. If the AI flags something that's intentional, dismiss the review and explain why.

Business logic calls. The AI can tell you the code compiles and handles errors. It can't tell you whether the feature actually solves the user's problem. That's a human judgment call.

Architectural fit. A function might be correct in isolation but wrong for your codebase's patterns. The AI doesn't have the context to evaluate "does this fit how we do things here."

Tradeoff decisions. Sometimes you ship code that's not perfect because the deadline matters more. The AI will flag it. You override it. That's fine — just make sure you're making the call consciously.

Look, I'm not saying ignore the AI when it's inconvenient. But know when you're making a deliberate tradeoff versus when you're just being lazy. There's a difference.

Common AI Code Review Errors and Fixes

"Resource not accessible by integration"

Your PAT doesn't have the right scopes. Needs repo at minimum. Regenerate with the right permissions.

AI review is too strict / requests changes on everything

Tune your checklist. If every PR fails, your "blockers" section is too aggressive. Move items to "warnings" or "suggestions." The goal is catching real problems, not creating busywork.

This is, frankly, the hardest part to get right. Start lenient. Tighten over time.

Reviews are slow (30+ seconds)

You're probably sending too much context. The script limits to 10 files and 500 lines per file — adjust those down if reviews are slow. Also check if the diff is huge. For large PRs, consider skipping AI review entirely and flagging for human review.

Cost is higher than expected

Track tokens. Add logging:

console.log(`Input: ${response.usage.input_tokens}, Output: ${response.usage.output_tokens}`);

A typical review should be 5K-10K input tokens. If you're hitting 50K+, you're sending too much file content. Use JustAnalytics or similar to track trends across all your CI workflows.

Next Steps

Once the basic flow works, some enhancements worth considering:

Inline comments. Instead of one big review comment, post comments on specific lines. GitHub's review API supports this — you'd have the AI output line numbers and comments in a structured format.

Review memory. If the author pushes fixes after the AI requests changes, the next review should acknowledge what was fixed rather than reviewing from scratch. DevOS's three-tier memory system is designed for exactly this kind of cross-session context.

Codeowner routing. Different checklist items for different file paths. Frontend files get one checklist; backend files get another. Match the review focus to the code being changed. Teams using ClickzProtect for ad fraud detection often add security-specific checks for any code touching payment or tracking pixels.

Metrics. Track how often AI reviews catch issues before humans, how often humans override, and how long PRs sit in review. Pipe that to your observability stack and look for patterns. You might be surprised — we found our AI was catching 40% of issues that would have made it to human review, which let senior devs focus on the meaty architectural stuff instead of pointing out missing error handlers for the hundredth time.

For teams that want AI agents as actual teammates — not just CI scripts — DevOS's QA agent is designed to take review tickets from the backlog, run automated checks, and request human approval when needed. Pre-launch; join the waitlist if that sounds interesting.

Frequently Asked Questions

Should an AI agent have merge permissions on a PR?

No. The AI agent should comment, request changes, and approve — but final merge should require a human. This keeps accountability clear and prevents runaway automation. Even if the agent is right 95% of the time, that 5% on a production branch isn't worth the risk.

How accurate are AI code reviews compared to human reviewers?

For surface-level issues — style violations, missing error handling, obvious bugs — AI catches roughly 70-80% of what a senior dev would catch. For architectural concerns, business logic flaws, or context-dependent decisions, AI misses most of them. Use AI for the tedious stuff; save human attention for the hard stuff.

What's the cost of running AI code review on every PR?

At Claude 3.5 Sonnet pricing, a typical PR review (diff plus context files, around 5K tokens input, 1K output) costs $0.02-0.04. A team pushing 200 PRs/month spends roughly $4-8/month. Add guardrails — skip drafts, limit file count, cap daily runs — and it stays predictable.

Can AI reviewers replace human code review entirely?

Not in 2026. Probably not in 2027 either, if I'm being honest.

AI handles mechanical checks — linting, type safety, common bug patterns, documentation gaps. But it can't evaluate whether the feature solves the user's problem, whether the abstraction will scale, or whether the approach fits your roadmap. Human review stays essential for those calls.


Join the DevOS Waitlist

AI agents that work as assignable team members inside your sprints — not single-task copilots. Planner, Developer, QA, and DevOps agents pick up work from the backlog, ship in branches, request review. Pro tier planned at $25/user/month, Team at $49/user/month (waitlist pricing). Pre-launch.

Join the waitlist → · How agents-as-employees works

ai-code-reviewpull-request-automationgithub-actionsci-cd-pipelinedeveloper-productivitybuildinpublicsaasstudioaiworkforcebuildwithclaude