'AI Agent Glossary for Software Teams: 40 Essential Terms'
I was on a call last month where three engineers used "agentic" to mean three different things. One meant "has a system prompt." Another meant "can use tools." The third meant "takes autonomous action toward goals." They argued for 15 minutes before realizing they weren't even disagreeing — they just had different dictionaries.
That's the state of AI agent vocabulary in 2026. The field moves faster than the terminology can stabilize. Marketing teams make up words. Researchers use precise definitions nobody outside academia knows. Engineers grab whatever sounds right and hope context fills in the gaps.
This AI agent glossary is the reference I wish existed when we started building DevOS. Forty terms. Defined for software teams. With context on how they matter for actual work — not academic definitions nobody uses in Slack.
Bookmark it. Link to it. Argue with me about definitions in the comments. (I'm probably wrong about at least three of these.)
Core Concepts
1. Agent
An AI system that takes actions toward goals. The defining characteristic isn't intelligence — it's agency. An agent decides what to do, does it, observes the result, and decides what to do next. A chatbot waits for your prompt. An agent takes tickets from your backlog and opens PRs at 3am.
Not all AI systems are agents. GPT-4 answering a question isn't an agent. GPT-4 embedded in a system that plans tasks, executes code, and iterates on failures is.
2. Agentic
Adjective form of agent. Describes behavior, not architecture. A system is agentic when it autonomously pursues goals through multiple steps without human intervention between each step.
This term has become marketing mush. Absolutely everyone slaps "agentic" on their chatbot now. When someone says their product is "agentic," ask: does it just respond to prompts with longer outputs, or does it actually take actions and adjust based on results? If they can't answer clearly, it's probably the former.
3. Multi-Agent System
Multiple specialized agents working together. Each agent handles a domain — one writes code, another writes tests, another handles deployment. The alternative is a single generalist agent doing everything (poorly, usually).
The tradeoff: multi-agent systems are more capable but harder to coordinate. Way harder. When your QA agent assumes the Developer agent already handled edge cases and vice versa, you get bugs. We've shipped embarrassing regressions this way. DevOS uses a four-agent architecture (Planner, Developer, QA, DevOps) with explicit handoff protocols — because we learned the hard way that implicit coordination doesn't work. See how agents work as sprint team members for more on the coordination challenges.
4. Orchestrator
The coordination layer in a multi-agent system. Assigns tasks to agents, manages handoffs, resolves conflicts, tracks progress. Without an orchestrator, agents step on each other — two agents editing the same file, three agents claiming the same ticket.
DevOS calls this the Super Orchestrator. It's the PM for your AI team.
5. Autonomy Level
How much an agent handles without human approval. We use an L1-L5 scale borrowed from self-driving cars:
- L1: Agent suggests, human acts
- L2: Agent acts, human approves (most production systems today)
- L3: Agent handles specific categories end-to-end
- L4: Agent manages multiple tickets across a sprint
- L5: Full autonomy, no human in loop (doesn't exist for production software in 2026)
See our deep dive on autonomy levels.
Tool Use & Capabilities
6. Tool Use
An agent invoking external capabilities. Reading files, calling APIs, executing code, querying databases. The tools an agent can use define what it can actually accomplish beyond generating text.
Tool use is what separates a coding agent from a chatbot that writes code. The chatbot gives you code to copy-paste. The agent writes it directly to your repo.
7. Function Calling
A specific implementation of tool use where the model outputs structured JSON that triggers predefined functions. OpenAI, Anthropic, and Google all support this natively. The model "calls" functions by outputting the function name and arguments; your code executes the actual function.
8. MCP (Model Context Protocol)
Anthropic's standard for connecting AI models to external tools and data sources. Think of it as USB for AI — a standardized way to plug in capabilities. An agent with MCP access can use any tool that implements the protocol without custom integration code.
9. Computer Use
An agent controlling a GUI like a human would — clicking buttons, filling forms, navigating interfaces. Slower than API access but works with any software, including legacy systems without APIs.
Claude Computer Use can fill out a Jira ticket by actually navigating the Jira UI. Useful for automation where APIs don't exist or aren't exposed.
10. Code Execution
An agent running code it writes. Critical for debugging, testing, and iteration. The agent writes a function, runs it, sees the error, fixes it — same loop a human follows, but faster.
Sandboxed execution environments matter here. A lot. You don't want an agent with code execution running rm -rf / because its planning went sideways. (Yes, this happens. No, I'm not going to tell you which company demo'd that failure.)
Memory & Context
11. Context Window
How much information the model can process at once. Measured in tokens. Claude's context window is 200K tokens (~150K words). GPT-4's is 128K tokens. Longer context lets agents reason over more code, more history, more documentation.
But bigger isn't always better. Inference gets slower and more expensive with more context. And honestly? Most agents don't know what to do with 200K tokens anyway — they get confused halfway through. The skill is knowing what to include, not including everything.
12. RAG (Retrieval-Augmented Generation)
Fetching relevant information before generating a response. Instead of hoping the model remembers something, you query a database for relevant context and include it in the prompt.
For coding agents, RAG means "search the codebase for relevant files before writing code." Essential for large repos where the agent can't hold the whole codebase in context.
13. Long-Term Memory
Information persisted across conversations. Session memory dies when the conversation ends. Long-term memory survives — your agent remembers decisions from last sprint, coding conventions you've established, bugs you've already fixed.
DevOS uses a three-tier memory system: Graphiti knowledge graphs for relationships, embeddings for semantic search, and automatic state recovery. We covered the full architecture in our deep dive on designing memory systems for coding agents.
14. Embeddings
Vector representations of text that capture semantic meaning. Similar concepts have similar embeddings, even with different words. Embeddings power semantic search — find code that does X, not just code containing the word X.
15. Knowledge Graph
Structured relationships between entities. "Function A calls Function B." "Module X depends on Module Y." "This bug was introduced in commit Z." Graphs capture relationships RAG misses.
Workflow & Process
16. Agent-Employee Model
Treating AI agents as team members rather than tools. Agents take tickets. They appear on sprint boards. They have velocity metrics. They attend standups (via async status updates).
This is the model DevOS is building toward — agents as assignable resources inside your project management system. Solo founders are already running full sprints this way.
17. Ticket Autonomy
How much of a ticket's lifecycle an agent handles independently. Low autonomy: agent drafts code, human reviews every step. High autonomy: agent takes ticket from backlog, implements, tests, opens PR, responds to review comments.
18. Handoff
Transfer of work between agents (or between agent and human). A clean handoff includes context — what was done, what's left, what decisions were made, what blockers exist. Bad handoffs lose context and cause rework.
19. Escalation
When an agent recognizes it's stuck and asks for help. Good escalation includes what the agent tried, why it failed, and specific options for human decision. Bad escalation is "I don't know how to proceed" with no context.
Getting escalation right is annoyingly hard. Too sensitive and the agent escalates every decision. Too loose and it confidently drives off a cliff. We've tuned ours three times and it's still not perfect.
20. Human-in-the-Loop
A system where humans approve or intervene at defined checkpoints. HITL is how you get L2-L3 autonomy safely. The agent proposes; the human disposes.
21. Sprint Velocity (for Agents)
Story points completed per sprint — tracked per agent. Velocity data tells you which agent types are reliable and which ones need more guardrails. Our QA agent averages 18 points per sprint. Our DevOps agent averages 12. Different domains, different baselines.
Technical Architecture
22. Prompt Engineering
Designing inputs that get the outputs you want. For agents, this means system prompts, task descriptions, examples, and constraints.
I'll be honest: I hate that this is a skill. It feels like we're negotiating with a picky genie. But good prompt engineering is the difference between an agent that hallucinates and one that follows your architecture. So here we are.
23. System Prompt
Persistent instructions that shape agent behavior. Defines personality, capabilities, constraints, and goals. In multi-agent systems, each agent type has a different system prompt — the QA agent's system prompt emphasizes edge cases and test coverage; the Developer agent's emphasizes clean code and PR conventions.
24. Chain-of-Thought
Having the model reason step-by-step before answering. "Let me think through this..." Improves accuracy on complex tasks. Costs more tokens but catches errors earlier.
25. ReAct Pattern
Reason-Act-Observe loop. The agent reasons about what to do, takes an action, observes the result, then reasons again. Most modern coding agents use some variant of ReAct.
26. Tree of Thoughts
Exploring multiple reasoning paths before committing. Instead of one chain-of-thought, the agent considers several approaches and evaluates which looks most promising. More compute, better decisions. In theory. In practice, most production systems skip this because it's slow and the win is marginal for typical coding tasks.
27. Self-Reflection
Agent evaluating its own outputs. "Wait, that code doesn't handle null values. Let me fix that." Good agents critique themselves before you have to.
28. Guardrails
Constraints that prevent unwanted behavior. "Don't modify production databases." "Don't push directly to main." "Don't approve your own PRs." Guardrails are the safety net when agent judgment fails.
Coordination & Communication
29. Agent Protocol
Standardized communication format between agents. How does the Planner agent tell the Developer agent what to build? How does the Developer agent tell the QA agent what to test? Protocols define the contract.
30. Message Bus
Infrastructure for agent-to-agent communication. Agents publish messages; other agents subscribe. Decouples agents so they don't need direct references to each other.
31. Shared State
Information accessible to all agents. The codebase. The sprint board. The deployment status. Shared state enables coordination without explicit communication — agents see the same truth.
32. Conflict Resolution
What happens when agents disagree or overlap. Two agents edit the same file. One agent's change breaks another's assumption. The orchestrator needs rules for who wins.
Agent Types
33. Planner Agent
Handles architecture, task breakdown, and prioritization. Reads the backlog, decomposes features into tickets, defines implementation approach. Doesn't write code — writes the plan for code.
34. Developer Agent
Writes code. Takes a ticket with clear requirements and produces working implementation. The most common agent type and the one people think of when they say "AI coding agent."
35. QA Agent
Writes tests, reviews code, catches bugs. Focuses on edge cases, coverage gaps, and regression risk. Works best with clear acceptance criteria — the vaguer the spec, the vaguer the tests. For teams managing test automation at scale, JustAnalytics can help track coverage metrics. We wrote more about AI agents for QA teams.
36. DevOps Agent
Handles infrastructure, deployment, and operations. Provisions databases, configures CI/CD, manages environment variables, monitors deployments. The agent that keeps the system running. See our guide on AI agents in CI/CD pipelines for practical examples.
37. Research Agent
Gathers information. Reads documentation, searches codebases, synthesizes findings. Often the first agent in a pipeline — understand the problem before solving it.
Emerging Concepts
38. Agent Marketplace
A curated collection of pre-built agents for specific tasks. Instead of building a QA agent from scratch, you install one from the marketplace, connect it to your repo, and assign it tickets. DevOS is building toward this — specialized agents as deployable packages.
39. Model Routing
Sending different tasks to different AI models based on cost, capability, or speed. Simple code completion goes to a fast, cheap model. Complex architecture decisions go to a more capable (and expensive) one. DevOS routes across Anthropic, Google, DeepSeek, and OpenAI based on task type.
40. Swarm Intelligence
Emergent behavior from multiple simple agents. No central orchestrator — agents follow local rules that produce coordinated global behavior. More common in research than production. Most real agent systems use explicit orchestration instead.
Honorable Mentions
Few-Shot Learning: Including examples in the prompt so the agent learns the pattern. "Here's how we write tests: [example]. Now write tests for this."
Zero-Shot: No examples — just the task description. Works for simple tasks; fails for anything requiring team-specific conventions.
Constitutional AI: Training agents to follow principles rather than just rules. "Be helpful, harmless, and honest" rather than "never say X."
Agent Framework: Libraries like LangChain, AutoGen, CrewAI that provide scaffolding for building agents. Useful for prototypes; teams often outgrow them. (Hot take: most agent frameworks add more abstraction than they're worth. You'll spend more time fighting the framework than building features.)
Quick Verdict
If you take one thing from this glossary: autonomy level is the metric that matters. When someone tells you their system is "agentic" or uses "multi-agent orchestration," ask what autonomy level they're operating at. L2 with human approval? L3 for specific categories? L4 in demos only?
The vocabulary doesn't matter if you can't deploy it. And deployable autonomy in 2026 lives at L2-L3. Everything else is either research or marketing.
DevOS is building the infrastructure to make L2-L3 practical for engineering teams — agents as sprint employees with real velocity metrics. We're still pre-launch (join the waitlist), but that's the direction.
Frequently Asked Questions
What does "agentic" mean in AI development?
Agentic describes AI systems that take autonomous action toward goals rather than just responding to prompts. An agentic system decides what steps to take, executes them, evaluates results, and adjusts its approach — without waiting for human input at each step. The key distinction: a chatbot answers questions, an agentic system completes tasks.
What is multi-agent orchestration?
Multi-agent orchestration is the coordination layer that manages multiple AI agents working together. It handles task assignment, handoffs between agents, conflict resolution, and progress tracking. Think of it as the project management layer for AI workers — deciding which agent handles which task and ensuring they don't step on each other.
What is the difference between tool use and function calling?
Tool use is the broader concept — an AI agent invoking external capabilities (APIs, databases, code execution). Function calling is a specific implementation where the model outputs structured JSON that triggers predefined functions. All function calling is tool use, but tool use also includes MCP servers, plugin systems, and direct API access.
What does ticket autonomy mean for AI agents?
Ticket autonomy measures how much of a ticket's lifecycle an agent handles without human intervention. Low autonomy: agent drafts code, human reviews every step. High autonomy: agent takes ticket from backlog, writes code, runs tests, opens PR, responds to review comments. The spectrum runs from 'fancy autocomplete' to 'junior engineer who never sleeps.'
Join the DevOS Waitlist
AI agents that work as employees inside your sprints, standups, and tickets — not single-task copilots. Planner / Developer / QA / DevOps agents pick up work from the backlog, ship in branches, request review. Linear-shaped backlog UI with AI underneath. Pre-launch.
Related Posts
SWE-bench Is Not Enough: What We Actually Need to Measure AI Coding Agents
SWE-bench measures one thing well. Production agents need to do twelve. Here's what's missing.
The End of the IDE: Why AI Agents Will Replace the Code Editor (Not the Engineer)
AI agents will replace the IDE before they replace you. Here's what comes next.
Your Agile Team Has a People Problem. AI Agents Are the Fix Nobody Built Yet.
AI coding tools are brilliant solo performers. But they don't take tickets, join standups, or work inside your sprint. Here's why that's the real gap — and what fixing it looks like.