All posts
DevOps

AI Agent CI/CD Pipeline: GitHub Actions Walkthrough (2026)

DevOS Platform TeamJune 2, 202610 min read

Last Tuesday at 2:47 AM, our staging build failed. Again.

A missing import. In a file someone touched 14 hours earlier. The kind of thing that takes a human 90 seconds to fix — but only if a human is awake to see the alert, context-switch into the codebase, and push a patch. I was not awake. I was absolutely not awake.

What if the CI pipeline just... fixed it itself?

That's what we're building here — an AI agent CI/CD pipeline that actually works. By the end, you'll have a GitHub Actions workflow that detects build failures, sends the error context to an AI agent, gets a proposed fix, opens a PR, and runs CI against the fix — no human required. We'll use Claude's API (swap in OpenAI if you prefer), and I'll show you the cost guardrails that keep this from becoming a financial disaster. Fair warning: I burned $87 in one weekend figuring this out so you don't have to.

Prerequisites

You'll need:

  • A GitHub repo with an existing CI workflow (we're modifying it, not replacing it)
  • Node.js 18+ (sample repo is TypeScript — pattern works for anything)
  • An Anthropic API key with credits loaded. $10 is plenty for testing.
  • Basic GitHub Actions YAML familiarity
  • gh CLI installed locally (optional, helpful for testing)

Sample repo: github.com/devos-examples/ci-agent-demo. Fork it to follow along.

What We're Building

The flow:

  1. Your normal CI runs (build, test, lint — whatever you have)
  2. Failure triggers a second workflow
  3. That workflow extracts error logs, relevant files, and the recent commit diff
  4. Sends context to Claude: "Here's a build failure. Propose a minimal fix."
  5. Claude responds with a diff
  6. Workflow applies the diff, commits to a new branch, opens a draft PR
  7. CI runs against the PR. Pass? You review and merge. Fail? PR stays open for human triage.

Pennies per failure. Capped so runaway loops don't drain your credits. (I learned the cap lesson the hard way.)

Step 1: Add Secrets to Your Repo

Go to your repo → Settings → Secrets and variables → Actions. Add:

  • ANTHROPIC_API_KEY: Your Claude API key
  • GH_PAT: A GitHub personal access token with repo and workflow permissions (needed to open PRs from Actions)

The PAT is annoying but necessary — GitHub's default GITHUB_TOKEN can't trigger other workflows, and we need the fix PR to run CI. I spent two hours debugging this before realizing the issue. Don't be me.

Step 2: Create the Failure Handler Workflow

Create .github/workflows/ci-agent-fix.yml:

name: AI Agent - Auto-Fix Build Failures

on:
  workflow_run:
    workflows: ["CI"] # Your main CI workflow name
    types:
      - completed

jobs:
  auto-fix:
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.workflow_run.head_branch }}
          fetch-depth: 10
          token: ${{ secrets.GH_PAT }}

      - name: Download failure logs
        uses: actions/download-artifact@v4
        with:
          name: ci-logs
          github-token: ${{ secrets.GH_PAT }}
          run-id: ${{ github.event.workflow_run.id }}
        continue-on-error: true

      - name: Extract error context
        id: extract
        run: |
          # Get the last 200 lines of CI output
          ERROR_LOG=$(cat ci-logs/*.log 2>/dev/null | tail -200 || echo "No logs found")

          # Get the diff from the failing commit
          DIFF=$(git diff HEAD~1 --unified=5)

          # Get the files that changed
          CHANGED_FILES=$(git diff HEAD~1 --name-only)

          # Save to files for the next step
          echo "$ERROR_LOG" > /tmp/error.log
          echo "$DIFF" > /tmp/diff.txt
          echo "$CHANGED_FILES" > /tmp/changed.txt

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Call AI agent for fix
        id: ai-fix
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          # Install the Anthropic SDK
          npm install @anthropic-ai/sdk

          # Run the fix script
          node .github/scripts/propose-fix.js

      - name: Apply fix and create PR
        if: steps.ai-fix.outputs.has_fix == 'true'
        env:
          GH_TOKEN: ${{ secrets.GH_PAT }}
        run: |
          BRANCH="auto-fix/${{ github.event.workflow_run.head_branch }}-$(date +%s)"

          git config user.name "CI Agent"
          git config user.email "[email protected]"

          git checkout -b "$BRANCH"
          git apply /tmp/proposed-fix.patch
          git add -A
          git commit -m "fix: auto-fix for CI failure

          Proposed by AI agent based on build error analysis.
          Original failure: ${{ github.event.workflow_run.html_url }}"

          git push origin "$BRANCH"

          gh pr create \
            --title "fix: Auto-fix for CI failure on ${{ github.event.workflow_run.head_branch }}" \
            --body "## Auto-generated fix

          The CI agent detected a build failure and proposed this fix.

          **Original failure:** ${{ github.event.workflow_run.html_url }}
          **Confidence:** $(cat /tmp/confidence.txt)

          Please review before merging." \
            --draft \
            --base "${{ github.event.workflow_run.head_branch }}"

This workflow triggers whenever your main CI workflow fails. The workflow_run event gives us access to the failure context. Honestly, I think GitHub's documentation for workflow_run is garbage — it took me three tries to get the ref checkout right.

Step 3: The AI Fix Script

Create .github/scripts/propose-fix.js:

const Anthropic = require("@anthropic-ai/sdk");
const fs = require("fs");

const client = new Anthropic();

async function proposeFix() {
  const errorLog = fs.readFileSync("/tmp/error.log", "utf8");
  const diff = fs.readFileSync("/tmp/diff.txt", "utf8");
  const changedFiles = fs.readFileSync("/tmp/changed.txt", "utf8").split("\n").filter(Boolean);

  // Read the actual content of changed files
  const fileContents = changedFiles
    .slice(0, 5) // Limit to 5 files to control token usage
    .map((f) => {
      try {
        return `--- ${f} ---\n${fs.readFileSync(f, "utf8")}`;
      } catch {
        return `--- ${f} --- (file not found)`;
      }
    })
    .join("\n\n");

  const prompt = `You are a senior engineer fixing a CI build failure. Analyze the error and propose a minimal fix.

## Error Log (last 200 lines)
${errorLog}

## Recent Commit Diff
${diff}

## Current File Contents
${fileContents}

## Instructions
1. Identify the root cause of the build failure
2. Propose a MINIMAL fix — only change what's necessary
3. Output your fix as a unified diff that can be applied with 'git apply'
4. Rate your confidence: HIGH (obvious fix), MEDIUM (likely correct), LOW (uncertain)

Respond in this exact format:
CONFIDENCE: [HIGH|MEDIUM|LOW]
EXPLANATION: [one sentence]
---PATCH---
[your unified diff here]
---END---`;

  const response = await client.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 2000,
    messages: [{ role: "user", content: prompt }],
  });

  const text = response.content[0].text;

  // Parse the response
  const confidenceMatch = text.match(/CONFIDENCE:\s*(HIGH|MEDIUM|LOW)/);
  const patchMatch = text.match(/---PATCH---([\s\S]*?)---END---/);

  if (patchMatch && patchMatch[1].trim()) {
    fs.writeFileSync("/tmp/proposed-fix.patch", patchMatch[1].trim());
    fs.writeFileSync("/tmp/confidence.txt", confidenceMatch?.[1] || "UNKNOWN");
    console.log("::set-output name=has_fix::true");
  } else {
    console.log("No valid patch found in AI response");
    console.log("::set-output name=has_fix::false");
  }
}

proposeFix().catch((err) => {
  console.error("AI fix failed:", err);
  process.exit(1);
});

That's the core. Send error context to Claude, parse the response, save the patch.

Step 4: Add AI Agent Cost Guardrails

Here's where people get burned.

Without guardrails, a failing CI with a retry loop can trigger the agent dozens of times in an hour. At $0.003-0.015 per 1K tokens (depending on model), that adds up fast. I've seen teams hit $200+ in a weekend because a flaky test kept retriggering the workflow. Not fun.

Add this to your workflow, right after the if condition on the job:

    # Cost guardrails
    env:
      MAX_DAILY_RUNS: 20
      MAX_TOKENS_PER_RUN: 4000

And add this step before calling the AI:

      - name: Check rate limits
        run: |
          # Count how many times this workflow ran today
          TODAY=$(date +%Y-%m-%d)
          RUN_COUNT=$(gh run list --workflow=ci-agent-fix.yml --created=$TODAY --json status --jq 'length')

          if [ "$RUN_COUNT" -ge "$MAX_DAILY_RUNS" ]; then
            echo "Daily limit reached ($MAX_DAILY_RUNS runs). Skipping AI fix."
            exit 0
          fi

Set a spending cap in your Anthropic dashboard too. I'd start at $50/month until you understand your failure patterns. Maybe lower if you're paranoid. (I'm paranoid now.)

Step 5: Modify Your Main CI to Upload Logs

For the failure handler to access logs, your main CI workflow needs to upload them as an artifact. Add this to your existing CI workflow:

      - name: Upload logs on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: ci-logs
          path: |
            *.log
            npm-debug.log*
            /tmp/*.log
          retention-days: 1

Common Errors and Fixes

"Resource not accessible by integration"

Your GH_PAT doesn't have the right permissions. Need repo (full) and workflow scopes. Regenerate the token.

"No valid patch found in AI response"

The AI couldn't generate a parseable diff. Happens with complex failures or when context is too large. Check max_tokens — if the response is cut off, increase it. You might also need to trim input context (only send relevant files). In my experience, this happens about 20% of the time with real-world failures. It's annoying, but that's why we have human fallback.

The fix PR doesn't trigger CI

GitHub prevents workflows from triggering other workflows by default (to avoid loops). That's why we use GH_PAT instead of GITHUB_TOKEN. Make sure the PAT has workflow scope.

Costs are higher than expected

Check how many files you're sending. Each file adds tokens. The script limits to 5 files — if your failures involve more, you might be sending 10K+ tokens per call. Use JustAnalytics or similar to track token usage trends over time. Add logging:

console.log(`Input tokens: ${response.usage.input_tokens}`);
console.log(`Output tokens: ${response.usage.output_tokens}`);

Next Steps

Once the basic flow works, some enhancements worth considering:

Selective triggering: Not every failure should hit the AI. Skip infrastructure failures, timeouts, or branches you don't care about. Your agent doesn't need to attempt fixes for dependabot branches — trust me on this.

Multi-model routing: Use a cheaper model (Claude 3.5 Haiku or GPT-4o-mini) for initial triage, escalate to a larger model only when the first pass fails. DevOS's AI agent orchestration aims to do this automatically — picking the cheapest capable model per task. (DevOS is pre-launch; join the waitlist if you want to try it when it ships.)

Observability: Pipe your CI metrics to something like JustAnalytics so you can track fix success rates, resolution time, and cost per fix over time. Otherwise you're flying blind.

Human-in-the-loop approval: For production branches, require human approval before the agent even attempts a fix. GitHub's environment protection rules handle this.

Sample repo at github.com/devos-examples/ci-agent-demo includes all of these patterns.

Frequently Asked Questions About AI Agent CI/CD

How much does running an AI agent in CI/CD cost per month?

Depends on build volume and failure rate. A mid-size project with 500 builds/month and 15% failure rate hitting Claude 3.5 Sonnet: roughly $30-60/month in API costs. The workflow above includes guardrails — spending caps, selective triggering — so you don't get a $400 surprise after a bad week.

Can an AI agent actually fix build failures automatically?

For certain failure types, yes. Type errors, missing imports, linting violations, straightforward test failures — fixable 60-70% of the time without human intervention. Complex logic bugs, flaky tests, environment-specific issues? Still need a human. The workflow opens a PR with the proposed fix; you review before merging.

What happens if the AI agent proposes a bad fix?

Fix goes into a draft PR, not directly to main. CI runs against the proposed fix. If tests still fail, PR stays open for human review. You can add a 'max attempts' limit — sample workflow caps at 2 fix attempts before escalating to a human.

Does this work with monorepos or multiple languages?

Yes, but scope the agent's context carefully. Sample repo is single Node.js project. For monorepos, modify the workflow to pass only the relevant package's files — otherwise you're burning tokens on irrelevant context and getting worse suggestions.


Join the DevOS Waitlist

If you're tired of babysitting AI coding tools that can't coordinate beyond a single task — DevOS is building something different. AI agents as actual employees inside your sprints, standups, and tickets. Planner, Developer, QA, and DevOps agents that pick up work from the backlog, ship in branches, request review. Linear-style backlog UI with agents underneath.

Pre-launch. No paying customers yet. But if the vision sounds interesting, get on the list.

Join the waitlist → · How agents-as-employees works

ai-agentci-cdgithub-actionsdevops-automationllm-opsbuildinpublicsaasstudioaiworkforcebuildwithclaude