Ensemble
Platform Guide

Building and Running AI Agents at Scale

Ensemble is an agent harness — the operational platform that takes AI models and makes them production-ready agents. This guide covers the full picture: building, deploying, securing, and governing agents at enterprise scale.

Contents
  1. 1. What is an AI agent?
  2. 2. Agent Lifecycle Management
  3. 3. Platform architecture
  4. 4. Building agents
  5. 5. Connecting to systems
  6. Deploying agents
  7. The Ensemble chat widget
  8. 8. Testing and observability
  9. 9. Security and data protection
  10. 10. Governance and policies
  11. 11. Picking the right use cases
  12. 12. How Ensemble compares

1. What is an AI agent?

An AI agent is software that can reason and take action — not just generate text. It uses a large language model (LLM) as its reasoning engine and wraps it with the ability to call external systems, execute multi-step processes, and adapt based on what it learns along the way.

Three things define an agent:

Agent vs. chatbot

Ask a standard LLM chatbot "What's the status of my order?" and it generates a plausible-sounding answer from training data. Ask an agent the same question and it connects to your order management system, queries the actual record, and returns the real status. The difference is real action vs. generated text.

Building an agent that works reliably in production — with the right tools, guardrails, observability, and governance — requires more than a model. It requires a harness: the operational layer that wraps the model and makes it enterprise-ready. That's what Ensemble provides.

2. Agent Lifecycle Management (ALM)

An agent that works in a demo is not the same as one that works reliably in production. ALM is the discipline of managing an agent through each stage of its life — from initial definition through continuous improvement. Ensemble is built to make each stage straightforward.

1
Define
What should the agent do? What systems does it need? What are its guardrails?
2
Build
Write the system prompt, select a model, connect tools and knowledge bases, define workflows.
3
Test
Run evaluations across real scenarios — not just happy paths. Check tool calls, retrieval, and edge cases.
4
Deploy
Promote from dev → staging → production. Embed the agent in the apps and channels where users need it.
5
Secure
Ensure the agent cannot leak data, exceed its authority, or create compliance risk.
6
Monitor
Track performance, cost, and failures in production. Run monitoring agents to catch issues automatically.
7
Improve
Collect feedback, surface failure patterns, and update the agent's configuration on a continuous basis.
8
Govern
Manage multiple agents across environments with version control, access policies, and compliance standards.

3. Platform architecture

As a full-stack agent harness, Ensemble is organized in five stacked layers — channels at the top through to security at the base — with a continuous observability loop running alongside. Each layer builds on the one below it; together they handle everything between a user's request and a reliable, governed agent response.

Channels & interfaces Rich UI widget Slack WhatsApp Voice SMS Direct API MCP server Orchestration Agent hierarchies Context passing Memory State management Agents Supervisor agent Sub-agent A Sub-agent B Sub-agent C LLM model Fast / Std / Reasoning Context MCP Tools (API, SQL, MQL…) Knowledge bases Workflows Prompts Security & compliance Identity / RBAC Tenant isolation Env promotion Audit & compliance Key store Observability & feedback Evals Tracing & metrics User feedback AI analysis Monitoring & governance Improve & redeploy
Ensemble platform architecture — user requests enter at the top through any channel and flow down through orchestration, agents, and context layers, all underpinned by security. The observability column spans the full stack with a continuous improvement loop.

4. Building agents

Every Ensemble agent is assembled from the same set of building blocks: a system prompt, a model, knowledge bases, workflows, and optionally sub-agents. Together these determine what the agent knows, how it reasons, and what it can do.

System prompt

The system prompt is the foundation of every agent — it defines the agent's role, objective, and rules. A well-crafted prompt specifies not just what the agent does, but what it must not do: "always verify eligibility before recommending a provider," "escalate if the customer expresses dissatisfaction," "never quote prices without checking current inventory." Think of it as the agent's standing instructions.

Model selection

Ensemble supports all major model providers out of the box — Anthropic, OpenAI, Google, Mistral, DeepSeek, and others — along with open-source models via self-hosted inference (vLLM, Ollama, or any OpenAI-compatible endpoint). Custom or fine-tuned models can be registered via API and used exactly like built-in ones. Model versions are managed centrally so you can pin agents to specific versions, set organization-wide defaults, and control which models are available in production.

Match the model tier to the task. In multi-agent setups, different agents in the same hierarchy can use different tiers — a supervisor routing requests might use a fast model while a sub-agent doing financial analysis uses a reasoning model.

TierExamplesBest for
Fast Claude Haiku, GPT-4.1 Mini, Gemini Flash, Mistral Small Routing, intent classification, FAQ answering, high-volume low-latency tasks
Standard Claude Sonnet, GPT-4.1, Gemini Pro, Llama 3.3 70B Customer support, multi-tool synthesis, report generation — most production workloads
Reasoning Claude Opus, o3, DeepSeek R1 Financial analysis, complex planning, ambiguous requests requiring careful judgment
Custom / self-hosted Fine-tuned models, private vLLM or Ollama deployments Domain-specific tasks, data residency requirements, cost optimization at scale
Ensemble LLM model management screen
The Ensemble LLM Models screen — all configured models from Anthropic, OpenAI, Cerebras, and others, with per-token pricing and version details. Admins set organization-wide defaults and can register custom or self-hosted models alongside built-ins.

Knowledge bases

Knowledge bases give agents access to your internal documentation — SOPs, policy docs, product manuals, support articles. Upload documents and Ensemble handles chunking, embedding, and indexing automatically.

At runtime, agents use RAG (Retrieval-Augmented Generation) to answer from your content. When a user asks a question, the agent converts it into a vector a numerical representation of meaning — semantically similar questions produce similar vectors and searches for the document sections closest in meaning. Those sections are injected into the agent's context, grounding its response in your actual documentation rather than base model training data.

Example

An HR agent connected to your benefits documentation can answer "Does my plan cover orthodontics?" by searching the actual plan document and quoting the relevant clause — not by guessing. Update the document, and the knowledge base re-indexes it immediately.

Ensemble Knowledge Base configuration screen showing embedding model, chunking settings, query parameters, and document stats
The Ensemble knowledge base configuration screen — embedding model, chunk size, overlap, similarity threshold, and reranking are all configurable, with sensible defaults that work out of the box. Shown: a financial documents knowledge base with 11 uploaded documents processed into 397 searchable chunks.

Workflows

Workflows define multi-step processes agents can execute. In Ensemble, you describe workflows in plain natural language — no code or flow diagrams needed. Ensemble generates the workflow structure from your description, which you can then refine, test, and publish like any other configuration.

Workflow execution runs on a massively scalable engine that handles high volumes of concurrent workflows without any infrastructure management on your part. Workflows support three step types, freely mixed within a single flow:

Example — insurance claims workflow

Agentic step: parse the claim narrative and extract key details → Deterministic step: verify policy coverage and run compliance checks → Agentic step: draft a response for the claimant → Deterministic step: route for human approval if the claim exceeds the threshold. Pure rule-based automation breaks on exceptions; pure LLM is unpredictable for compliance steps. The hybrid handles both cleanly.

Ensemble workflow builder showing Client Credit Approval
The Ensemble workflow builder — a visual canvas where deterministic steps, LLM steps, human approvals, and tool calls are composed into a flow. Shown: the Client Credit Approval workflow with parallel approval paths and conditional routing.

Agent hierarchies

A single agent works well for focused tasks. As scope grows, it's better to split work across specialized agents. Ensemble supports three composable patterns.

Why not put everything in one agent? Three reasons: context limits — an agent with 40 tools reasons worse than one with 8, because it struggles to choose between them; isolation — a prompt change or bug in one monolithic agent affects all its use cases simultaneously; and parallelism — a supervisor can run multiple sub-agents concurrently, a single agent cannot.

Example — default agent with sub-agent delegation

A procurement agent handles most requests itself — vendor status, contract terms, policy questions. When a request involves complex multi-quarter financial analysis, it delegates to a specialized financial analysis sub-agent with deeper data access and a reasoning-tier model. The procurement agent doesn't need to know how to do the analysis; it just knows when to ask for help.

Example — supervisor with domain specialists

A user says: "I was double-charged and now I can't log in." The supervisor identifies two distinct issues and routes them in parallel — a billing sub-agent handles the refund; an account sub-agent diagnoses the login failure. Each specialist accesses only the systems it needs. The supervisor synthesizes both results into a single response.

Sub-agents vs. tools

A tool is a single discrete action (query a database, call an API). A sub-agent is a full reasoning unit — it can use multiple tools, follow a workflow, and make multi-step decisions. Use a tool when the task is one action. Use a sub-agent when the task requires judgment, multiple steps, or specialized context that would bloat the parent agent.

Ensemble agent hierarchy UI
The Ensemble agent hierarchy view — supervisors and their sub-agents shown in an expandable tree, with each agent's tools, workflows, and knowledge bases visible inline. Shown: the Client Operations Supervisor with three specialist sub-agents.
Ensemble agent configuration screen showing the Credit and Resolution Agent with system prompt, model, description, and four tools
A fully configured Ensemble agent — system prompt, model selection, description, and tools (issue_credit, send_client_notification, log_audit_event, send_slack_notification) all in one view. The "unpublished changes" indicator shows the draft/publish versioning in action: changes are isolated in draft until deliberately promoted.

5. Connecting agents to systems

Agents are only as useful as the systems they can reach. Ensemble handles this through connections (authentication profiles for external systems) and tools (the actions agents can take once connected).

Connections

A connection is a reusable, pre-configured authentication profile for an external system — your CRM, ERP, HRIS, or database. Credentials are stored encrypted server-side and are never exposed to the browser or the LLM. When agents are promoted across environments, credentials are automatically re-encrypted for the target environment.

Tools

Tools are what agents and workflows use to take action. The same tool can be invoked by an agent reasoning about what to do next, or as an explicit step within a workflow — the same tool works in both contexts.

Ensemble ships with a large library of pre-built connectors — Salesforce, HubSpot, Jira, ServiceNow, Zendesk, Workday, SAP, and many others — requiring no custom integration work. Utility tools (web search, geocoding, weather) are available with no setup. For everything else, Ensemble provides flexible building blocks:

Tool design tip

Keep tools narrowly focused. An agent chooses tools based on their names and descriptions — get_order_status leads to better selection than query_db. Always return useful error information on failure so the agent can decide whether to retry, ask the user for clarification, or try an alternative path.

6. Deploying agents

Ensemble agents can be embedded in applications, invoked directly via API, or exposed as an MCP server. One agent configuration powers all three — build once, integrate anywhere.

Chat widget

A React-based chat widget that embeds in any web or mobile application, with real-time response streaming. Authentication flows through JWT tokens a secure, compact standard for passing user identity between systems — API keys and secrets never reach the browser. The widget is highly customizable: it supports fully custom UI themes, and agents can return structured JSON rendered as rich interactive components — tables, charts, booking forms, product cards — built as custom React components in your codebase.

REST APIs

Every agent and every workflow is individually available as a REST API endpoint. Any system that can make an HTTP request can invoke an agent or trigger a workflow — backend services, mobile apps, third-party platforms, or scheduled jobs. This makes Ensemble agents first-class citizens in any existing architecture, no chat interface required.

MCP server

The entire Ensemble platform is exposed as an MCP server. Model Context Protocol: an open standard for AI models to securely discover and call external tools and data sources This means any MCP-compatible AI client — Claude Desktop, Cursor, and others — can discover and invoke your agents, workflows, and tools directly. Organizations building AI-native internal tooling can expose their entire agent catalog through a single endpoint.

Channels

The same agent configuration powers all communication channels: embedded web widget, Slack (webhook), SMS and WhatsApp (Twilio / AWS SNS), and Voice (Twilio / WebRTC with real-time TTS and STT).

Deployment options

OptionDescriptionBest for
Public cloudEnsemble-hosted, internet-accessibleTeams wanting fast time-to-value with no infrastructure management
Private cloudDedicated infrastructure, not shared with other tenantsOrganizations requiring network isolation or custom security configuration
Self-hostedDeployed in your own data center; all data stays within your perimeterHealthcare, financial services, government, defense — wherever data cannot leave the organization

7. The Ensemble chat widget

The Ensemble chat widget is more than an embeddable chatbox. It's a fully programmable UI runtime — designed so that every aspect of the experience, from styling to data to component rendering, is under your control. Agents communicate through it, but so does your application: passing context, injecting components, and controlling session isolation all happen at the integration layer.

Custom React components rendered at runtime

Agents can return structured JSON payloads that the widget renders as fully custom React components — inline, in the conversation flow, in real time. These aren't pre-built templates: they're components you write and register. A vendor recommendation surfaces as a rich card with a map, distance, and action buttons. A product search returns an interactive grid. A booking confirmation renders a form. The agent handles the reasoning; your components handle the presentation.

Ensemble Chat Configurator showing custom widget types alongside a live preview rendering rich vendor recommendation cards
The Ensemble Chat Configurator — custom widget types (person-card, map-widget, vendor-cards) registered alongside built-ins. The live preview shows an agent response rendered as rich, interactive vendor cards with location data, verification badges, and action buttons — not plain text.

Fully configurable

Every visual aspect of the widget is overridable. CSS variables control colors, typography, border radii, and spacing — override any of them to match your application's design system exactly. The widget ships with a live configurator where style changes reflect instantly in a preview, making it straightforward to tune the experience before embedding. The widget works in two modes: inline, rendered as a persistent interface in a designated page area, and popup, triggered as an overlay — switchable via configuration with no code changes.

SDK and runtime context

The Ensemble SDK handles initialization, authentication, and session management. At runtime, your application can pass context directly into the agent's session — the current user, their role, the page they're on, any relevant application state. The agent receives this context and can use it to personalize responses, filter results, or apply role-appropriate guardrails, without the user having to re-explain their situation.

Example

A healthcare application passes the current patient's ID and care plan status into the widget on initialization. The agent immediately knows who it's talking to and what their care status is — without the patient needing to identify themselves or repeat information already in their record.

Multi-threaded and user-isolated

Each user session runs in its own isolated thread. Conversation history, context, and state are scoped to that session and never bleed across users — enforced at the platform level, not just the application level. The caller controls session lifecycle via the SDK: create a new thread, resume an existing one, or clear a session entirely. This makes the widget suitable for multi-user environments, shared devices, and applications where strict per-user isolation is a compliance requirement.

What this enables

Most embedded chat widgets give you a styled chatbox. Ensemble's widget gives you a rendering engine: the agent decides what to say and what data to return; your application decides how to present it. The result is agent-powered experiences that feel native to your product, not bolted on.

8. Testing and observability

Evaluations

Ensemble's evaluations module tests the full agent pipeline — not just the LLM's text output, but tool calls, knowledge base retrievals, and workflow execution. An agent can produce a plausible-sounding response while calling the wrong API or misinterpreting the data it retrieved. Evals catch these failures before users encounter them. You define test cases with sample inputs and expected outputs; the eval framework verifies the agent produces the right results through the right steps.

Monitoring agents

Monitoring agents are a second line of defense that runs continuously in production. These are purpose-built agents configured to observe live conversations across multiple dimensions — automatically, at scale, without requiring human review of every exchange:

When a monitoring agent detects an issue, it can trigger automated actions: flag the conversation for human review, alert the agent owner, or escalate to a live agent. This creates a continuous quality loop without manual auditing at scale.

Example

A financial services company deploys a customer support agent for account inquiries. A monitoring agent observes every conversation, checking for PII in responses, compliance with disclosure requirements, and any discussion of products outside the agent's authorized scope. When a conversation is flagged, a compliance team member receives an alert with the full context, the specific policy triggered, and a recommended action.

Versioning

Every agent and tool exists in one of two states: Draft — the editable working copy used for development and testing — and Published (v1, v2…) — an immutable snapshot. Once published, a version never changes; previous versions remain accessible for instant rollback. The workflow is always: edit in draft → run evals → publish when ready. A published agent should always reference published tools, not drafts — a published agent referencing a draft tool can break silently when someone edits that tool.

Environment promotion

Configurations sync one-way across environments: devstagingprod. All dependencies — tools, knowledge bases, connections, sub-agents — are included in the sync, preventing partial deployments. Credentials are re-encrypted for the target environment automatically. The sync API integrates directly with CI/CD pipelines.

Observability

Ensemble provides full production visibility through message tracing (step-by-step view of each turn — tools called, knowledge retrieved, reasoning taken), performance metrics (token usage, latency, and cost per agent and workflow), user feedback (thumbs up/down built into the chat widget), and AI-powered improvement suggestions generated from patterns in feedback and monitoring data.

Ensemble message tracing UI
Ensemble message tracing — conversation threads listed on the left, full turn-by-turn detail on the right. Each message shows the agent, timestamp, and exact content, making it straightforward to trace unexpected behavior back to its source.
The improvement loop

Deploy → monitoring agents surface issues in live conversations → patterns appear in the observability dashboard → agent owner updates configuration → evals verify the fix → promote to production. Agents improve continuously with use, not just at launch.

Ensemble user feedback dashboard
The Ensemble feedback dashboard — total responses, satisfaction rate, positive/negative counts, and a filterable feed of individual ratings by agent and time period.

9. Security and data protection

Identity and access

Ensemble enforces role-based access control (RBAC) with three roles: Owner, Admin, and standard user. User identity flows through JWT tokens from your existing auth system — agent API keys and secrets are managed entirely server-side and never reach the browser or the LLM's context window.

Data isolation

All conversations, tool inputs/outputs, and knowledge base contents are isolated at the tenant level. One organization's data is never accessible to another. Credentials in connections are encrypted at rest and re-encrypted when synced across environments.

Preventing data exposure

Compliance

Ensemble maintains the following certifications. These apply to the platform itself — organizations handling regulated data should additionally establish their own policies around data access, conversation retention, and agent behavior auditing.

SOC 2 Type II
Security, availability & confidentiality
ISO 27001
Information security management

10. Governance and policies

AI agent governance differs from traditional software governance in one fundamental way: agent behavior is not fully determined by code. It's shaped by instructions, model choices, retrieved data, and runtime context — all of which can vary. A prompt edit changes how an agent handles thousands of edge cases. A model upgrade can subtly shift reasoning. A knowledge base update changes what the agent believes.

This creates two distinct challenges: operational governance — how changes are made, tested, and deployed safely — and AI governance — what agents are permitted to do, say, and decide. Both are essential.

AI governance

Behavioral guardrails. Every production agent needs explicit written rules for what it must not do: topics to decline, actions requiring human approval, how to handle sensitive requests. These live in the system prompt and should be treated as formal policy documents — reviewed and versioned just like code. A poorly written guardrail is as dangerous as a missing one: overly broad restrictions make agents useless; overly narrow ones leave gaps.

Example — guardrails for a financial support agent

"Never commit to a refund or credit over $500 without routing to a human agent." "Do not recommend products outside the approved catalog." "If a customer mentions legal action, acknowledge and transfer immediately — do not attempt to resolve."

Human-in-the-loop (HITL) policies. Define which decisions agents can make autonomously and which require a human sign-off. The threshold should be based on reversibility and impact. Low-stakes, reversible actions (answering questions, looking up records) can be fully autonomous. High-stakes or irreversible actions (initiating payments, sending external communications, modifying critical records) should have a human approval step built into the workflow.

Model governance. Define which model providers and versions are approved for production use, establish tier-by-use-case guidelines (not every workflow needs the most powerful model), and treat model upgrades as significant changes requiring testing. A model update can alter agent behavior in ways that aren't obvious until they affect real conversations.

Continuous behavioral monitoring. Monitoring agents (Section 7) are the operational arm of AI governance — they actively watch live conversations for policy violations, data exposure, and quality issues. This creates an audit trail of agent behavior, not just configuration, and surfaces the gap between what you intended the agent to do and what it's actually doing.

AI behavioral incidents. AI agents introduce a new category of incident beyond outages and breaches: behavioral incidents — cases where an agent does something unexpected, harmful, or contrary to policy. Define a response process: who is notified, what triggers an immediate rollback vs. a configuration patch, how affected users are identified and communicated with, and how root cause is analyzed.

Operational governance

Versioning and promotion. Production always runs a published version — never a draft. Changes flow one-way through dev → staging → prod, initiated only by Admin or Owner roles. All dependencies are promoted together. The sync API integrates with CI/CD for programmatic, auditable promotion. (Full versioning details in Section 7.)

Role-based permissions. Owners and Admins can build agents, configure connections, and promote to production. Standard users can interact with agents but cannot modify configurations.

Audit trails. All agent interactions, tool invocations, configuration changes, and environment promotions are logged — a complete record of what each agent did, what data it accessed, and who changed its configuration. Essential for compliance reporting and incident investigation.

Data access policies

Apply least privilege at the agent level. Each agent is configured with only the connections and tools it genuinely needs for its role.

AgentShould have access toShould not have access to
Customer supportCRM, order management, product knowledge baseHR database, financial systems, internal pricing
HR benefitsBenefits documents, employee records (read-only)Payroll write access, performance review data
ProcurementVendor database, budget system, approval workflowsCustomer data, employee compensation records

Operational policies to establish

The governing principle

Think of a production agent like a new employee with direct access to your systems and customer relationships. You wouldn't give that person unrestricted access and no guidelines. The same logic applies: define what they're allowed to do, scope their access to what they need, monitor their work, and have a clear plan for when things go wrong.

11. Picking the right use cases

Starting with the wrong use case wastes time and erodes organizational confidence. Here's a practical framework for identifying where to start.

Strong candidates

Poor candidates

Where to start

Ask: where are skilled employees spending hours on tasks that feel like they should take minutes? Start narrow — one well-scoped workflow — measure before and after, and use the evidence to expand.

12. How Ensemble compares

AlternativeWhat it gives youWhere it falls shortEnsemble's approach
DIY on cloud
AWS Bedrock, Azure OpenAI, GCP Vertex
Maximum flexibility; you choose every component You build and maintain everything: orchestration, versioning, evals, security, multi-tenancy, deployment pipelines All of that is built in. Focus on the agent's purpose, not the infrastructure.
Traditional RPA
UiPath, Automation Anywhere
Proven for deterministic, rule-based automation; strong legacy system integration Breaks on variation. Bolting LLMs onto RPA retrofits intelligence onto a rules engine. AI-native from the ground up. Hybrid workflows combine LLM reasoning and deterministic rules naturally.
Vertical point solutions
Domain-specific chatbots, copilots
Fast time-to-value for one specific use case Fragmentation — each use case is its own vendor, integration, and data silo. No shared infrastructure or learnings. One platform for all agent types. Build connections, security, and tooling once; reuse across every use case.
Raw LLM APIs
Direct OpenAI, Anthropic, Google
Simplest path for simple use cases; no intermediary layer Everything else: tool orchestration, knowledge management, multi-agent coordination, versioning, observability, security Ensemble is the harness layer — the orchestration, tooling, observability, and governance infrastructure that turns a model into a production-ready agent.