Ensemble is an agent harness — the operational platform that takes AI models and makes them production-ready agents. This guide covers the full picture: building, deploying, securing, and governing agents at enterprise scale.
An AI agent is software that can reason and take action — not just generate text. It uses a large language model (LLM) as its reasoning engine and wraps it with the ability to call external systems, execute multi-step processes, and adapt based on what it learns along the way.
Three things define an agent:
Ask a standard LLM chatbot "What's the status of my order?" and it generates a plausible-sounding answer from training data. Ask an agent the same question and it connects to your order management system, queries the actual record, and returns the real status. The difference is real action vs. generated text.
Building an agent that works reliably in production — with the right tools, guardrails, observability, and governance — requires more than a model. It requires a harness: the operational layer that wraps the model and makes it enterprise-ready. That's what Ensemble provides.
An agent that works in a demo is not the same as one that works reliably in production. ALM is the discipline of managing an agent through each stage of its life — from initial definition through continuous improvement. Ensemble is built to make each stage straightforward.
As a full-stack agent harness, Ensemble is organized in five stacked layers — channels at the top through to security at the base — with a continuous observability loop running alongside. Each layer builds on the one below it; together they handle everything between a user's request and a reliable, governed agent response.
Every Ensemble agent is assembled from the same set of building blocks: a system prompt, a model, knowledge bases, workflows, and optionally sub-agents. Together these determine what the agent knows, how it reasons, and what it can do.
The system prompt is the foundation of every agent — it defines the agent's role, objective, and rules. A well-crafted prompt specifies not just what the agent does, but what it must not do: "always verify eligibility before recommending a provider," "escalate if the customer expresses dissatisfaction," "never quote prices without checking current inventory." Think of it as the agent's standing instructions.
Ensemble supports all major model providers out of the box — Anthropic, OpenAI, Google, Mistral, DeepSeek, and others — along with open-source models via self-hosted inference (vLLM, Ollama, or any OpenAI-compatible endpoint). Custom or fine-tuned models can be registered via API and used exactly like built-in ones. Model versions are managed centrally so you can pin agents to specific versions, set organization-wide defaults, and control which models are available in production.
Match the model tier to the task. In multi-agent setups, different agents in the same hierarchy can use different tiers — a supervisor routing requests might use a fast model while a sub-agent doing financial analysis uses a reasoning model.
| Tier | Examples | Best for |
|---|---|---|
| Fast | Claude Haiku, GPT-4.1 Mini, Gemini Flash, Mistral Small | Routing, intent classification, FAQ answering, high-volume low-latency tasks |
| Standard | Claude Sonnet, GPT-4.1, Gemini Pro, Llama 3.3 70B | Customer support, multi-tool synthesis, report generation — most production workloads |
| Reasoning | Claude Opus, o3, DeepSeek R1 | Financial analysis, complex planning, ambiguous requests requiring careful judgment |
| Custom / self-hosted | Fine-tuned models, private vLLM or Ollama deployments | Domain-specific tasks, data residency requirements, cost optimization at scale |
Knowledge bases give agents access to your internal documentation — SOPs, policy docs, product manuals, support articles. Upload documents and Ensemble handles chunking, embedding, and indexing automatically.
At runtime, agents use RAG (Retrieval-Augmented Generation) to answer from your content. When a user asks a question, the agent converts it into a vector a numerical representation of meaning — semantically similar questions produce similar vectors and searches for the document sections closest in meaning. Those sections are injected into the agent's context, grounding its response in your actual documentation rather than base model training data.
An HR agent connected to your benefits documentation can answer "Does my plan cover orthodontics?" by searching the actual plan document and quoting the relevant clause — not by guessing. Update the document, and the knowledge base re-indexes it immediately.
Workflows define multi-step processes agents can execute. In Ensemble, you describe workflows in plain natural language — no code or flow diagrams needed. Ensemble generates the workflow structure from your description, which you can then refine, test, and publish like any other configuration.
Workflow execution runs on a massively scalable engine that handles high volumes of concurrent workflows without any infrastructure management on your part. Workflows support three step types, freely mixed within a single flow:
Agentic step: parse the claim narrative and extract key details → Deterministic step: verify policy coverage and run compliance checks → Agentic step: draft a response for the claimant → Deterministic step: route for human approval if the claim exceeds the threshold. Pure rule-based automation breaks on exceptions; pure LLM is unpredictable for compliance steps. The hybrid handles both cleanly.
A single agent works well for focused tasks. As scope grows, it's better to split work across specialized agents. Ensemble supports three composable patterns.
Why not put everything in one agent? Three reasons: context limits — an agent with 40 tools reasons worse than one with 8, because it struggles to choose between them; isolation — a prompt change or bug in one monolithic agent affects all its use cases simultaneously; and parallelism — a supervisor can run multiple sub-agents concurrently, a single agent cannot.
A procurement agent handles most requests itself — vendor status, contract terms, policy questions. When a request involves complex multi-quarter financial analysis, it delegates to a specialized financial analysis sub-agent with deeper data access and a reasoning-tier model. The procurement agent doesn't need to know how to do the analysis; it just knows when to ask for help.
A user says: "I was double-charged and now I can't log in." The supervisor identifies two distinct issues and routes them in parallel — a billing sub-agent handles the refund; an account sub-agent diagnoses the login failure. Each specialist accesses only the systems it needs. The supervisor synthesizes both results into a single response.
A tool is a single discrete action (query a database, call an API). A sub-agent is a full reasoning unit — it can use multiple tools, follow a workflow, and make multi-step decisions. Use a tool when the task is one action. Use a sub-agent when the task requires judgment, multiple steps, or specialized context that would bloat the parent agent.
Agents are only as useful as the systems they can reach. Ensemble handles this through connections (authentication profiles for external systems) and tools (the actions agents can take once connected).
A connection is a reusable, pre-configured authentication profile for an external system — your CRM, ERP, HRIS, or database. Credentials are stored encrypted server-side and are never exposed to the browser or the LLM. When agents are promoted across environments, credentials are automatically re-encrypted for the target environment.
Tools are what agents and workflows use to take action. The same tool can be invoked by an agent reasoning about what to do next, or as an explicit step within a workflow — the same tool works in both contexts.
Ensemble ships with a large library of pre-built connectors — Salesforce, HubSpot, Jira, ServiceNow, Zendesk, Workday, SAP, and many others — requiring no custom integration work. Utility tools (web search, geocoding, weather) are available with no setup. For everything else, Ensemble provides flexible building blocks:
Keep tools narrowly focused. An agent chooses tools based on their names and descriptions — get_order_status leads to better selection than query_db. Always return useful error information on failure so the agent can decide whether to retry, ask the user for clarification, or try an alternative path.
Ensemble agents can be embedded in applications, invoked directly via API, or exposed as an MCP server. One agent configuration powers all three — build once, integrate anywhere.
A React-based chat widget that embeds in any web or mobile application, with real-time response streaming. Authentication flows through JWT tokens a secure, compact standard for passing user identity between systems — API keys and secrets never reach the browser. The widget is highly customizable: it supports fully custom UI themes, and agents can return structured JSON rendered as rich interactive components — tables, charts, booking forms, product cards — built as custom React components in your codebase.
Every agent and every workflow is individually available as a REST API endpoint. Any system that can make an HTTP request can invoke an agent or trigger a workflow — backend services, mobile apps, third-party platforms, or scheduled jobs. This makes Ensemble agents first-class citizens in any existing architecture, no chat interface required.
The entire Ensemble platform is exposed as an MCP server. Model Context Protocol: an open standard for AI models to securely discover and call external tools and data sources This means any MCP-compatible AI client — Claude Desktop, Cursor, and others — can discover and invoke your agents, workflows, and tools directly. Organizations building AI-native internal tooling can expose their entire agent catalog through a single endpoint.
The same agent configuration powers all communication channels: embedded web widget, Slack (webhook), SMS and WhatsApp (Twilio / AWS SNS), and Voice (Twilio / WebRTC with real-time TTS and STT).
| Option | Description | Best for |
|---|---|---|
| Public cloud | Ensemble-hosted, internet-accessible | Teams wanting fast time-to-value with no infrastructure management |
| Private cloud | Dedicated infrastructure, not shared with other tenants | Organizations requiring network isolation or custom security configuration |
| Self-hosted | Deployed in your own data center; all data stays within your perimeter | Healthcare, financial services, government, defense — wherever data cannot leave the organization |
The Ensemble chat widget is more than an embeddable chatbox. It's a fully programmable UI runtime — designed so that every aspect of the experience, from styling to data to component rendering, is under your control. Agents communicate through it, but so does your application: passing context, injecting components, and controlling session isolation all happen at the integration layer.
Agents can return structured JSON payloads that the widget renders as fully custom React components — inline, in the conversation flow, in real time. These aren't pre-built templates: they're components you write and register. A vendor recommendation surfaces as a rich card with a map, distance, and action buttons. A product search returns an interactive grid. A booking confirmation renders a form. The agent handles the reasoning; your components handle the presentation.
Every visual aspect of the widget is overridable. CSS variables control colors, typography, border radii, and spacing — override any of them to match your application's design system exactly. The widget ships with a live configurator where style changes reflect instantly in a preview, making it straightforward to tune the experience before embedding. The widget works in two modes: inline, rendered as a persistent interface in a designated page area, and popup, triggered as an overlay — switchable via configuration with no code changes.
The Ensemble SDK handles initialization, authentication, and session management. At runtime, your application can pass context directly into the agent's session — the current user, their role, the page they're on, any relevant application state. The agent receives this context and can use it to personalize responses, filter results, or apply role-appropriate guardrails, without the user having to re-explain their situation.
A healthcare application passes the current patient's ID and care plan status into the widget on initialization. The agent immediately knows who it's talking to and what their care status is — without the patient needing to identify themselves or repeat information already in their record.
Each user session runs in its own isolated thread. Conversation history, context, and state are scoped to that session and never bleed across users — enforced at the platform level, not just the application level. The caller controls session lifecycle via the SDK: create a new thread, resume an existing one, or clear a session entirely. This makes the widget suitable for multi-user environments, shared devices, and applications where strict per-user isolation is a compliance requirement.
Most embedded chat widgets give you a styled chatbox. Ensemble's widget gives you a rendering engine: the agent decides what to say and what data to return; your application decides how to present it. The result is agent-powered experiences that feel native to your product, not bolted on.
Ensemble's evaluations module tests the full agent pipeline — not just the LLM's text output, but tool calls, knowledge base retrievals, and workflow execution. An agent can produce a plausible-sounding response while calling the wrong API or misinterpreting the data it retrieved. Evals catch these failures before users encounter them. You define test cases with sample inputs and expected outputs; the eval framework verifies the agent produces the right results through the right steps.
Monitoring agents are a second line of defense that runs continuously in production. These are purpose-built agents configured to observe live conversations across multiple dimensions — automatically, at scale, without requiring human review of every exchange:
When a monitoring agent detects an issue, it can trigger automated actions: flag the conversation for human review, alert the agent owner, or escalate to a live agent. This creates a continuous quality loop without manual auditing at scale.
A financial services company deploys a customer support agent for account inquiries. A monitoring agent observes every conversation, checking for PII in responses, compliance with disclosure requirements, and any discussion of products outside the agent's authorized scope. When a conversation is flagged, a compliance team member receives an alert with the full context, the specific policy triggered, and a recommended action.
Every agent and tool exists in one of two states: Draft — the editable working copy used for development and testing — and Published (v1, v2…) — an immutable snapshot. Once published, a version never changes; previous versions remain accessible for instant rollback. The workflow is always: edit in draft → run evals → publish when ready. A published agent should always reference published tools, not drafts — a published agent referencing a draft tool can break silently when someone edits that tool.
Configurations sync one-way across environments: dev → staging → prod. All dependencies — tools, knowledge bases, connections, sub-agents — are included in the sync, preventing partial deployments. Credentials are re-encrypted for the target environment automatically. The sync API integrates directly with CI/CD pipelines.
Ensemble provides full production visibility through message tracing (step-by-step view of each turn — tools called, knowledge retrieved, reasoning taken), performance metrics (token usage, latency, and cost per agent and workflow), user feedback (thumbs up/down built into the chat widget), and AI-powered improvement suggestions generated from patterns in feedback and monitoring data.
Deploy → monitoring agents surface issues in live conversations → patterns appear in the observability dashboard → agent owner updates configuration → evals verify the fix → promote to production. Agents improve continuously with use, not just at launch.
Ensemble enforces role-based access control (RBAC) with three roles: Owner, Admin, and standard user. User identity flows through JWT tokens from your existing auth system — agent API keys and secrets are managed entirely server-side and never reach the browser or the LLM's context window.
All conversations, tool inputs/outputs, and knowledge base contents are isolated at the tenant level. One organization's data is never accessible to another. Credentials in connections are encrypted at rest and re-encrypted when synced across environments.
Ensemble maintains the following certifications. These apply to the platform itself — organizations handling regulated data should additionally establish their own policies around data access, conversation retention, and agent behavior auditing.
AI agent governance differs from traditional software governance in one fundamental way: agent behavior is not fully determined by code. It's shaped by instructions, model choices, retrieved data, and runtime context — all of which can vary. A prompt edit changes how an agent handles thousands of edge cases. A model upgrade can subtly shift reasoning. A knowledge base update changes what the agent believes.
This creates two distinct challenges: operational governance — how changes are made, tested, and deployed safely — and AI governance — what agents are permitted to do, say, and decide. Both are essential.
Behavioral guardrails. Every production agent needs explicit written rules for what it must not do: topics to decline, actions requiring human approval, how to handle sensitive requests. These live in the system prompt and should be treated as formal policy documents — reviewed and versioned just like code. A poorly written guardrail is as dangerous as a missing one: overly broad restrictions make agents useless; overly narrow ones leave gaps.
"Never commit to a refund or credit over $500 without routing to a human agent." "Do not recommend products outside the approved catalog." "If a customer mentions legal action, acknowledge and transfer immediately — do not attempt to resolve."
Human-in-the-loop (HITL) policies. Define which decisions agents can make autonomously and which require a human sign-off. The threshold should be based on reversibility and impact. Low-stakes, reversible actions (answering questions, looking up records) can be fully autonomous. High-stakes or irreversible actions (initiating payments, sending external communications, modifying critical records) should have a human approval step built into the workflow.
Model governance. Define which model providers and versions are approved for production use, establish tier-by-use-case guidelines (not every workflow needs the most powerful model), and treat model upgrades as significant changes requiring testing. A model update can alter agent behavior in ways that aren't obvious until they affect real conversations.
Continuous behavioral monitoring. Monitoring agents (Section 7) are the operational arm of AI governance — they actively watch live conversations for policy violations, data exposure, and quality issues. This creates an audit trail of agent behavior, not just configuration, and surfaces the gap between what you intended the agent to do and what it's actually doing.
AI behavioral incidents. AI agents introduce a new category of incident beyond outages and breaches: behavioral incidents — cases where an agent does something unexpected, harmful, or contrary to policy. Define a response process: who is notified, what triggers an immediate rollback vs. a configuration patch, how affected users are identified and communicated with, and how root cause is analyzed.
Versioning and promotion. Production always runs a published version — never a draft. Changes flow one-way through dev → staging → prod, initiated only by Admin or Owner roles. All dependencies are promoted together. The sync API integrates with CI/CD for programmatic, auditable promotion. (Full versioning details in Section 7.)
Role-based permissions. Owners and Admins can build agents, configure connections, and promote to production. Standard users can interact with agents but cannot modify configurations.
Audit trails. All agent interactions, tool invocations, configuration changes, and environment promotions are logged — a complete record of what each agent did, what data it accessed, and who changed its configuration. Essential for compliance reporting and incident investigation.
Apply least privilege at the agent level. Each agent is configured with only the connections and tools it genuinely needs for its role.
| Agent | Should have access to | Should not have access to |
|---|---|---|
| Customer support | CRM, order management, product knowledge base | HR database, financial systems, internal pricing |
| HR benefits | Benefits documents, employee records (read-only) | Payroll write access, performance review data |
| Procurement | Vendor database, budget system, approval workflows | Customer data, employee compensation records |
Think of a production agent like a new employee with direct access to your systems and customer relationships. You wouldn't give that person unrestricted access and no guidelines. The same logic applies: define what they're allowed to do, scope their access to what they need, monitor their work, and have a clear plan for when things go wrong.
Starting with the wrong use case wastes time and erodes organizational confidence. Here's a practical framework for identifying where to start.
Ask: where are skilled employees spending hours on tasks that feel like they should take minutes? Start narrow — one well-scoped workflow — measure before and after, and use the evidence to expand.
| Alternative | What it gives you | Where it falls short | Ensemble's approach |
|---|---|---|---|
| DIY on cloud AWS Bedrock, Azure OpenAI, GCP Vertex |
Maximum flexibility; you choose every component | You build and maintain everything: orchestration, versioning, evals, security, multi-tenancy, deployment pipelines | All of that is built in. Focus on the agent's purpose, not the infrastructure. |
| Traditional RPA UiPath, Automation Anywhere |
Proven for deterministic, rule-based automation; strong legacy system integration | Breaks on variation. Bolting LLMs onto RPA retrofits intelligence onto a rules engine. | AI-native from the ground up. Hybrid workflows combine LLM reasoning and deterministic rules naturally. |
| Vertical point solutions Domain-specific chatbots, copilots |
Fast time-to-value for one specific use case | Fragmentation — each use case is its own vendor, integration, and data silo. No shared infrastructure or learnings. | One platform for all agent types. Build connections, security, and tooling once; reuse across every use case. |
| Raw LLM APIs Direct OpenAI, Anthropic, Google |
Simplest path for simple use cases; no intermediary layer | Everything else: tool orchestration, knowledge management, multi-agent coordination, versioning, observability, security | Ensemble is the harness layer — the orchestration, tooling, observability, and governance infrastructure that turns a model into a production-ready agent. |