Skip to content

Ecosystem

AgentV is the evaluation layer in the AI agent lifecycle. It works alongside runtime governance and observability tools — each handles a different concern with zero overlap.

LayerToolQuestion it answers
Evaluate (pre-production)AgentV”Is this agent good enough to deploy?”
Govern (runtime)Agent Control”Should this action be allowed?”
Observe (runtime)Langfuse”What is the agent doing in production?”

Offline evaluation and testing. Run eval cases against agents, score with deterministic code graders + LLM judges, detect regressions, gate CI/CD pipelines. Everything lives in Git.

agentv eval evals/my-agent.yaml

Runtime guardrails. Intercepts agent actions (tool calls, API requests) and evaluates them against configurable policies. Deny, steer, warn, or log — without changing agent code. Pluggable evaluators with confidence scoring.

Production observability. Traces agent execution with explicit Tool/LLM/Retrieval observation types, ingests evaluation scores, and provides dashboards for debugging and monitoring. Self-hostable.

Define evals (YAML in Git)
|
v
Run evals locally or in CI (AgentV)
|
v
Deploy agent to production
|
v
Enforce policies on tool calls (Agent Control)
| |
v v
Trace execution (Langfuse) Log violations (Agent Control)
|
v
Feed production traces back into evals (AgentV)

The feedback loop is key: Langfuse traces surface real-world failures that become new AgentV eval cases. Agent Control deny/steer events identify safety gaps that become new test scenarios.

This maps to how traditional software works:

TraditionalAI Agent Equivalent
Test suite (Jest, pytest)AgentV
WAF / auth middlewareAgent Control
APM / logging (Datadog)Langfuse

AgentV handles:

  • Eval definition and execution
  • Code + LLM graders
  • Regression detection and CI/CD gating
  • Multi-provider A/B comparison

Agent Control handles:

  • Runtime policy enforcement (deny/steer/warn/log)
  • Pre/post execution evaluation of agent actions
  • Pluggable evaluators (regex, JSON, SQL, LLM-based)
  • Centralized control plane with dashboard

Langfuse handles:

  • Production tracing with agent-native observation types
  • Live evaluation automation on trace ingestion
  • Score ingestion from external evaluators
  • Team dashboards and debugging