AI Agent Glossary: 40 Terms (2026)

I was on a call last month where three engineers used "agentic" to mean three different things. One meant "has a system prompt." Another meant "can use tools." The third meant "takes autonomous action toward goals." They argued for 15 minutes before realizing they weren't even disagreeing — they just had different dictionaries.

That's the state of AI agent vocabulary in 2026. The field moves faster than the terminology can stabilize. Marketing teams make up words. Researchers use precise definitions nobody outside academia knows. Engineers grab whatever sounds right and hope context fills in the gaps.

This AI agent glossary is the reference I wish existed when we started building DevOS. Forty terms. Defined for software teams. With context on how they matter for actual work — not academic definitions nobody uses in Slack.

Bookmark it. Link to it. Argue with me about definitions in the comments. (I'm probably wrong about at least three of these.)

Core Concepts

1. Agent

An AI system that takes actions toward goals. The defining characteristic isn't intelligence — it's agency. An agent decides what to do, does it, observes the result, and decides what to do next. A chatbot waits for your prompt. An agent takes tickets from your backlog and opens PRs at 3am.

Not all AI systems are agents. GPT-4 answering a question isn't an agent. GPT-4 embedded in a system that plans tasks, executes code, and iterates on failures is.

2. Agentic

Adjective form of agent. Describes behavior, not architecture. A system is agentic when it autonomously pursues goals through multiple steps without human intervention between each step.

This term has become marketing mush. Absolutely everyone slaps "agentic" on their chatbot now. When someone says their product is "agentic," ask: does it just respond to prompts with longer outputs, or does it actually take actions and adjust based on results? If they can't answer clearly, it's probably the former.

3. Multi-Agent System

Multiple specialized agents working together. Each agent handles a domain — one writes code, another writes tests, another handles deployment. The alternative is a single generalist agent doing everything (poorly, usually).

The tradeoff: multi-agent systems are more capable but harder to coordinate. Way harder. When your QA agent assumes the Developer agent already handled edge cases and vice versa, you get bugs. We've shipped embarrassing regressions this way. DevOS uses a four-agent architecture (Planner, Developer, QA, DevOps) with explicit handoff protocols — because we learned the hard way that implicit coordination doesn't work. See how agents work as sprint team members for more on the coordination challenges.

4. Orchestrator

The coordination layer in a multi-agent system. Assigns tasks to agents, manages handoffs, resolves conflicts, tracks progress. Without an orchestrator, agents step on each other — two agents editing the same file, three agents claiming the same ticket.

DevOS calls this the Super Orchestrator. It's the PM for your AI team.

5. Autonomy Level

How much an agent handles without human approval. We use an L1-L5 scale borrowed from self-driving cars:

L1: Agent suggests, human acts
L2: Agent acts, human approves (most production systems today)
L3: Agent handles specific categories end-to-end
L4: Agent manages multiple tickets across a sprint
L5: Full autonomy, no human in loop (doesn't exist for production software in 2026)

See our deep dive on autonomy levels.

Tool Use & Capabilities

6. Tool Use

An agent invoking external capabilities. Reading files, calling APIs, executing code, querying databases. The tools an agent can use define what it can actually accomplish beyond generating text.

Tool use is what separates a coding agent from a chatbot that writes code. The chatbot gives you code to copy-paste. The agent writes it directly to your repo.

7. Function Calling

A specific implementation of tool use where the model outputs structured JSON that triggers predefined functions. OpenAI, Anthropic, and Google all support this natively. The model "calls" functions by outputting the function name and arguments; your code executes the actual function.

8. MCP (Model Context Protocol)

Anthropic's standard for connecting AI models to external tools and data sources. Think of it as USB for AI — a standardized way to plug in capabilities. An agent with MCP access can use any tool that implements the protocol without custom integration code.

9. Computer Use

An agent controlling a GUI like a human would — clicking buttons, filling forms, navigating interfaces. Slower than API access but works with any software, including legacy systems without APIs.

Claude Computer Use can fill out a Jira ticket by actually navigating the Jira UI. Useful for automation where APIs don't exist or aren't exposed.

10. Code Execution

An agent running code it writes. Critical for debugging, testing, and iteration. The agent writes a function, runs it, sees the error, fixes it — same loop a human follows, but faster.

Sandboxed execution environments matter here. A lot. You don't want an agent with code execution running rm -rf / because its planning went sideways. (Yes, this happens. No, I'm not going to tell you which company demo'd that failure.)

Memory & Context

11. Context Window

How much information the model can process at once. Measured in tokens. Claude's context window is 200K tokens (~150K words). GPT-4's is 128K tokens. Longer context lets agents reason over more code, more history, more documentation.

But bigger isn't always better. Inference gets slower and more expensive with more context. And honestly? Most agents don't know what to do with 200K tokens anyway — they get confused halfway through. The skill is knowing what to include, not including everything.

12. RAG (Retrieval-Augmented Generation)

Fetching relevant information before generating a response. Instead of hoping the model remembers something, you query a database for relevant context and include it in the prompt.

For coding agents, RAG means "search the codebase for relevant files before writing code." Essential for large repos where the agent can't hold the whole codebase in context.

13. Long-Term Memory

Information persisted across conversations. Session memory dies when the conversation ends. Long-term memory survives — your agent remembers decisions from last sprint, coding conventions you've established, bugs you've already fixed.

DevOS uses a three-tier memory system: Graphiti knowledge graphs for relationships, embeddings for semantic search, and automatic state recovery. We covered the full architecture in our deep dive on designing memory systems for coding agents.

14. Embeddings

Vector representations of text that capture semantic meaning. Similar concepts have similar embeddings, even with different words. Embeddings power semantic search — find code that does X, not just code containing the word X.

15. Knowledge Graph

Structured relationships between entities. "Function A calls Function B." "Module X depends on Module Y." "This bug was introduced in commit Z." Graphs capture relationships RAG misses.

Workflow & Process

16. Agent-Employee Model

Treating AI agents as team members rather than tools. Agents take tickets. They appear on sprint boards. They have velocity metrics. They attend standups (via async status updates).

This is the model DevOS is building toward — agents as assignable resources inside your project management system. Solo founders are already running full sprints this way.

17. Ticket Autonomy

How much of a ticket's lifecycle an agent handles independently. Low autonomy: agent drafts code, human reviews every step. High autonomy: agent takes ticket from backlog, implements, tests, opens PR, responds to review comments.

18. Handoff

Transfer of work between agents (or between agent and human). A clean handoff includes context — what was done, what's left, what decisions were made, what blockers exist. Bad handoffs lose context and cause rework.

19. Escalation

When an agent recognizes it's stuck and asks for help. Good escalation includes what the agent tried, why it failed, and specific options for human decision. Bad escalation is "I don't know how to proceed" with no context.

Getting escalation right is annoyingly hard. Too sensitive and the agent escalates every decision. Too loose and it confidently drives off a cliff. We've tuned ours three times and it's still not perfect.

20. Human-in-the-Loop

A system where humans approve or intervene at defined checkpoints. HITL is how you get L2-L3 autonomy safely. The agent proposes; the human disposes.

21. Sprint Velocity (for Agents)

Story points completed per sprint — tracked per agent. Velocity data tells you which agent types are reliable and which ones need more guardrails. Our QA agent averages 18 points per sprint. Our DevOps agent averages 12. Different domains, different baselines.

Technical Architecture

22. Prompt Engineering

Designing inputs that get the outputs you want. For agents, this means system prompts, task descriptions, examples, and constraints.

I'll be honest: I hate that this is a skill. It feels like we're negotiating with a picky genie. But good prompt engineering is the difference between an agent that hallucinates and one that follows your architecture. So here we are.

23. System Prompt

Persistent instructions that shape agent behavior. Defines personality, capabilities, constraints, and goals. In multi-agent systems, each agent type has a different system prompt — the QA agent's system prompt emphasizes edge cases and test coverage; the Developer agent's emphasizes clean code and PR conventions.

24. Chain-of-Thought

Having the model reason step-by-step before answering. "Let me think through this..." Improves accuracy on complex tasks. Costs more tokens but catches errors earlier.

25. ReAct Pattern

Reason-Act-Observe loop. The agent reasons about what to do, takes an action, observes the result, then reasons again. Most modern coding agents use some variant of ReAct.

26. Tree of Thoughts

Exploring multiple reasoning paths before committing. Instead of one chain-of-thought, the agent considers several approaches and evaluates which looks most promising. More compute, better decisions. In theory. In practice, most production systems skip this because it's slow and the win is marginal for typical coding tasks.

27. Self-Reflection

Agent evaluating its own outputs. "Wait, that code doesn't handle null values. Let me fix that." Good agents critique themselves before you have to.

28. Guardrails

Constraints that prevent unwanted behavior. "Don't modify production databases." "Don't push directly to main." "Don't approve your own PRs." Guardrails are the safety net when agent judgment fails.

Coordination & Communication

29. Agent Protocol

Standardized communication format between agents. How does the Planner agent tell the Developer agent what to build? How does the Developer agent tell the QA agent what to test? Protocols define the contract.

30. Message Bus

Infrastructure for agent-to-agent communication. Agents publish messages; other agents subscribe. Decouples agents so they don't need direct references to each other.

31. Shared State

Information accessible to all agents. The codebase. The sprint board. The deployment status. Shared state enables coordination without explicit communication — agents see the same truth.

32. Conflict Resolution

What happens when agents disagree or overlap. Two agents edit the same file. One agent's change breaks another's assumption. The orchestrator needs rules for who wins.

Agent Types

33. Planner Agent

Handles architecture, task breakdown, and prioritization. Reads the backlog, decomposes features into tickets, defines implementation approach. Doesn't write code — writes the plan for code.

34. Developer Agent

Writes code. Takes a ticket with clear requirements and produces working implementation. The most common agent type and the one people think of when they say "AI coding agent."

35. QA Agent

Writes tests, reviews code, catches bugs. Focuses on edge cases, coverage gaps, and regression risk. Works best with clear acceptance criteria — the vaguer the spec, the vaguer the tests. For teams managing test automation at scale, JustAnalytics can help track coverage metrics. We wrote more about AI agents for QA teams.

36. DevOps Agent

Handles infrastructure, deployment, and operations. Provisions databases, configures CI/CD, manages environment variables, monitors deployments. The agent that keeps the system running. See our guide on AI agents in CI/CD pipelines for practical examples.

37. Research Agent

Gathers information. Reads documentation, searches codebases, synthesizes findings. Often the first agent in a pipeline — understand the problem before solving it.

Emerging Concepts

38. Agent Marketplace

A curated collection of pre-built agents for specific tasks. Instead of building a QA agent from scratch, you install one from the marketplace, connect it to your repo, and assign it tickets. DevOS is building toward this — specialized agents as deployable packages.

39. Model Routing

Sending different tasks to different AI models based on cost, capability, or speed. Simple code completion goes to a fast, cheap model. Complex architecture decisions go to a more capable (and expensive) one. DevOS routes across Anthropic, Google, DeepSeek, and OpenAI based on task type.

40. Swarm Intelligence

Emergent behavior from multiple simple agents. No central orchestrator — agents follow local rules that produce coordinated global behavior. More common in research than production. Most real agent systems use explicit orchestration instead.

Honorable Mentions

Few-Shot Learning: Including examples in the prompt so the agent learns the pattern. "Here's how we write tests: [example]. Now write tests for this."

Zero-Shot: No examples — just the task description. Works for simple tasks; fails for anything requiring team-specific conventions.

Constitutional AI: Training agents to follow principles rather than just rules. "Be helpful, harmless, and honest" rather than "never say X."

Agent Framework: Libraries like LangChain, AutoGen, CrewAI that provide scaffolding for building agents. Useful for prototypes; teams often outgrow them. (Hot take: most agent frameworks add more abstraction than they're worth. You'll spend more time fighting the framework than building features.)

Quick Verdict

If you take one thing from this glossary: autonomy level is the metric that matters. When someone tells you their system is "agentic" or uses "multi-agent orchestration," ask what autonomy level they're operating at. L2 with human approval? L3 for specific categories? L4 in demos only?

The vocabulary doesn't matter if you can't deploy it. And deployable autonomy in 2026 lives at L2-L3. Everything else is either research or marketing.

DevOS is building the infrastructure to make L2-L3 practical for engineering teams — agents as sprint employees with real velocity metrics. We're still pre-launch (join the waitlist), but that's the direction.

Frequently Asked Questions

What does "agentic" mean in AI development?

Agentic describes AI systems that take autonomous action toward goals rather than just responding to prompts. An agentic system decides what steps to take, executes them, evaluates results, and adjusts its approach — without waiting for human input at each step. The key distinction: a chatbot answers questions, an agentic system completes tasks.

What is multi-agent orchestration?

Multi-agent orchestration is the coordination layer that manages multiple AI agents working together. It handles task assignment, handoffs between agents, conflict resolution, and progress tracking. Think of it as the project management layer for AI workers — deciding which agent handles which task and ensuring they don't step on each other.

What is the difference between tool use and function calling?

Tool use is the broader concept — an AI agent invoking external capabilities (APIs, databases, code execution). Function calling is a specific implementation where the model outputs structured JSON that triggers predefined functions. All function calling is tool use, but tool use also includes MCP servers, plugin systems, and direct API access.

What does ticket autonomy mean for AI agents?

Ticket autonomy measures how much of a ticket's lifecycle an agent handles without human intervention. Low autonomy: agent drafts code, human reviews every step. High autonomy: agent takes ticket from backlog, writes code, runs tests, opens PR, responds to review comments. The spectrum runs from 'fancy autocomplete' to 'junior engineer who never sleeps.'

Join the DevOS Waitlist

AI agents that work as employees inside your sprints, standups, and tickets — not single-task copilots. Planner / Developer / QA / DevOps agents pick up work from the backlog, ship in branches, request review. Linear-shaped backlog UI with AI underneath. Pre-launch.

Join the waitlist → · How agents-as-employees works

Related Posts

SWE-bench Is Not Enough: What We Actually Need to Measure AI Coding Agents

The End of the IDE: Why AI Agents Will Replace the Code Editor (Not the Engineer)

Your Agile Team Has a People Problem. AI Agents Are the Fix Nobody Built Yet.