Anatomy of a plain AI Agent
Real agents aren't LLM wrappers. They are durable services that take action.
A Practical View of AI Agents
The phrase "agentic AI" has gained significant popularity over the last few years. However, I have observed that many platforms use the term to describe what is effectively a traditional workflow automation. When I think about agentic AI or AI agents, I classify them into three distinct categories based on how they operate within critical software systems: Conversational agents, Workflow agents, and Autonomous agents.
Conversational agents respond to natural language interactions initiated by humans or by other systems. These agents typically act as an interface to underlying services and internal tools, with one or more AI models deciding which tools to invoke based on the context of the request. Some examples are customer support agents that retrieve account information, issue refunds, or update tickets, and IT agents that can reset user passwords or provision access across multiple systems. Their value does not come from conversation alone, but from their ability to safely access data and perform real actions.
Workflow agents are responsible for carrying out multi step processes where the exact path is not fully predetermined. Unlike traditional workflows that follow a fixed, pre programmed sequence, workflow agents use language models and contextual information to decide what needs to happen next. An onboarding agent is a typical example. Depending on whether the user is an individual or an enterprise customer, it may choose different steps, apply a promotion, provision resources, or guide the user to the appropriate documentation.
Autonomous agents operate independently with a well defined mission. They run on a schedule or in response to events and continue working without direct human input. A trading agent that monitors markets and executes trades based on high level goals such as maximizing returns while managing risk fits into this category. Because these agents are designed to run continuously and make decisions on their own, they require strong guarantees around durability, safety, and observability.
When I started building agents as part of the plain platform, I adhered to a few foundational principles. Regardless of how an agent is triggered or what role it plays, a production grade agent on plain needs the following:
- Access to real tools such as APIs and SDKs so it can take real actions
- Durable deployment as a long running service that can scale with demand and recover from failures
- Authentication and authorization to ensure only approved users and systems can invoke the agent and its capabilities
- Consistency and repeatability so every agent generated by the system behaves predictably and meets the same guarantees
These principles shaped how plain AI agents are designed and built, and they apply uniformly across conversational, workflow, and autonomous agents.
The Anatomy of a Production-Grade plain AI Agent
For the purpose of this blog, I will focus on conversational agents and how they are built and operated on the plain platform. The same underlying principles and structural components apply to workflow agents and autonomous agents, with differences primarily in how they are triggered, orchestrated, and constrained. To make this concrete, the sections below break down the core components that make up a production-grade plain conversational agent
Deployment and API
A plain conversational agent is deployed as a long running service. Each agent exposes an OpenAI compatible chat completions endpoint, making it easy to integrate into existing AI products and systems without custom adapters. Because the agent runs as a service rather than a transient process, it can scale with demand, maintain state, and recover from failures.
Authentication and Authorization
Every agent invocation is authenticated and authorized. plain agents use JWT based authentication and are compatible with any OIDC provider. Authorization is enforced through role based policies, allowing fine grained control over which users or systems can invoke an agent and what capabilities they are allowed to access. This makes it possible to support different permission levels across user tiers while maintaining strict isolation.
Tool Integration
Agents derive their usefulness from their ability to take action through tools. plain agents support an extensible tool system that includes built in utility tools, third party API integrations, and custom code execution. Tool access is governed by explicit policies, ensuring that agents can only invoke tools they are authorized to use. This prevents accidental or malicious access to sensitive systems while keeping the agent flexible.
Model Flexibility
Each agent is backed by a configurable language model. Different agents within the same service can use different models depending on their responsibilities, and model selection can be adjusted based on task requirements. This allows teams to balance capability, cost, and latency without changing the surrounding system.
Memory System
plain agents include a persistent memory layer with configurable scoping. Memory can be partitioned at the user level for individual context, at the organization level for shared team knowledge, or at a global level for system wide information. This structure allows agents to retain relevant context over time while maintaining proper boundaries between users and organizations.
Multi Agent Architecture
A single plain service can host multiple specialized sub agents. Each sub agent has a clearly defined responsibility, its own tool access, and its own operating constraints. A coordinator agent manages orchestration across these sub agents when a task spans multiple domains. This separation of concerns keeps agent behavior understandable, testable, and easier to evolve over time.
From Specification to a Deployed Agent
plain agents are built from declarative specifications rather than imperative code. Instead of wiring together infrastructure, runtimes, and guardrails by hand, you describe what your agents are allowed to do and how they should behave. plain handles everything required to turn those specifications into a deployed, production-ready service.
What You Provide
To create a plain agent service, you define your agents using YAML specifications. These specs describe the role of each agent, the tools it can access, and the policies that govern its behavior. Infrastructure, deployment, and runtime concerns are handled automatically by the platform.
A typical project structure looks like this:
Project Structure
Each route represents an independent agent endpoint with its own configuration, permissions, and execution boundaries.
Defining an Agent
Agents are defined in an agents.yaml file. Each agent specification captures intent, scope, and constraints explicitly.
At a minimum, an agent definition includes:
- a role that describes who the agent is
- a goal that defines what it is trying to accomplish
- a backstory that provides behavioral guidance
- a list of tools the agent is allowed to use
Optional fields allow you to further constrain execution and add critical instructions.
Example:
Agent Specification
This specification is the source of truth. plain uses it to generate the agent runtime, enforce constraints, and expose a stable API.
Tool Access and Control
Agents can only interact with tools that are explicitly listed in their specification. Tools are referenced by their registered identifiers and can include internal APIs, third-party services, or custom code execution. Tool behavior can be further controlled using a tool mode:
- append: adds tools to any inherited defaults
- replace: restricts the agent to only the tools you specify
This ensures that agents cannot access systems outside their intended scope.
Policies and Guardrails
Agent behavior can be constrained using optional policies that limit execution and reduce risk. These include limits on iterations, tool calls, and conversation turns, as well as safeguards around write operations and network access.
Examples of supported policies include:
- maximum LLM to tool iterations
- maximum total tool invocations
- confirmation requirements for write actions
- explicit allow or deny lists for tools and network domains
These policies are enforced at runtime and apply consistently across all agents.
Multiple Routes and Services
For services that expose multiple endpoints, you can define separate agent configurations per route. Each route operates independently with its own agents, tools, and permissions.
Project Structure
This makes it possible to run multiple agent driven capabilities within a single deployment while maintaining clear separation of concerns.
The Tech Stack Behind plain Agents
plain agents are built on a production-grade tech stack designed to support long-running execution, secure access to tools, and predictable behavior under real load. Each layer of the stack maps directly to a core responsibility in the agent lifecycle, from request handling to orchestration, memory, and security.
Runtime and API Layer
plain agents run as Python-based microservices. FastAPI provides the HTTP layer, offering a lightweight and well understood framework for building scalable services. Each agent exposes an OpenAI-compatible chat completions endpoint with support for streaming responses and standard request and response formats. This allows agents to integrate easily into existing applications without requiring custom client logic.
Agent Orchestration and Execution
Agent behavior is modeled explicitly rather than embedded in prompts. plain agents use LangGraph to define agent state machines and control flow, with each agent represented as a graph of nodes. For multi-agent scenarios, a coordinator pattern is used to route work between specialized agents. This graph-based approach makes execution paths explicit, debuggable, and easier to evolve as agent behavior becomes more complex.
Authentication and Security
All invocations of plain agents are authenticated using JWTs and are compatible with any OIDC-compliant identity provider. Authorization is enforced through a policy layer that governs both route access and tool usage. This ensures that agents can only be invoked by approved users or systems and can only access the tools and data they are explicitly permitted to use.
Memory and Persistence
plain agents include a persistent memory layer backed by Mem0. Memory can be partitioned by scope, including user-level context, organization-level shared knowledge, and global system memory. This structure allows agents to retain relevant context across runs while maintaining strict isolation boundaries between users and organizations.
LLM Integration
Language models are accessed through LangChain, which provides a consistent abstraction across multiple model providers. Each plain agent can be configured with its own model, and different agents within the same service can use different models depending on their responsibilities. Model selection is defined declaratively in agent specifications and can be adjusted without changing the surrounding system.
Configuration and Specifications
Agent behavior and capabilities are defined through declarative configuration. YAML is used for agent definitions and input specifications, while JSON is used for service-level settings. This approach keeps agent configuration explicit, versionable, and reproducible across environments.
Tech Stack Summary
| Layer | Responsibility | Technology |
|---|---|---|
| Runtime | Long-running agent service | Python |
| API Layer | HTTP and streaming interface | FastAPI |
| Orchestration | Agent state machines and control flow | LangGraph |
| Authentication | Request authentication | JWT with OIDC |
| Authorization | Tool and route access control | Policy enforcement layer |
| Memory | Persistent agent memory | Mem0 |
| LLM Abstraction | Model integration | LangChain |
| Model Providers | Task-specific model selection | Multi-provider support |
| Configuration | Agent and service configuration | YAML, JSON, environment variables |
| API Compatibility | Client integration | OpenAI-compatible endpoints |
The possibilities for agents are endless
This architecture makes it possible to build AI agents that can safely perform real operational tasks inside production systems. Below are two examples that illustrate what becomes feasible when agents are deployed as durable services with explicit permissions and controlled access to tools.
An IT support agent that can safely disable access for a terminated employee is a good example. Offboarding typically requires coordinated changes across multiple systems, such as identity providers, internal tools, and SaaS applications. A plain agent can authenticate the request, determine which systems apply to a specific user, and invoke the appropriate APIs to revoke access in a controlled sequence. Policies ensure that sensitive actions require confirmation, while guardrails prevent the agent from accessing systems outside its authorization. Every action is auditable, and failures can be retried without leaving systems in an inconsistent state.
An onboarding agent that adapts flows based on user context is another example. Onboarding rarely follows a single fixed path. Depending on the user type, the agent may provision resources, apply promotions, assign roles, or route users to the appropriate documentation. A plain agent can evaluate context at runtime and decide which steps to take, while policies constrain what it is allowed to create or modify. This allows onboarding to remain flexible without sacrificing predictability or control.
