Wednesday, May 27, 2026

AI: Evals for Agentic Systems: Promptfoo by OpenAI

Macro Evals for Agentic Systems @OpenAI

When an agentic system fails, the problem is often larger than a single bad response. A handoff may happen too late, a specialist agent may miss the same signal across many runs, or a review process may trigger for the wrong class of cases. To improve the system, teams need to see recurring behavior across the whole population of traces.

This cookbook walks through a macro-eval workflow for a multi-agent system. We use a synthetic EV order workflow where specialist agents handle pricing, compliance, supply, factory routing, scheduling, and release decisions while market and operational conditions change.


Intro | Promptfoo

romptfoo is an open-source CLI and library for evaluating and red-teaming LLM apps.

With promptfoo, you can:

  • Build reliable prompts, models, and RAGs with benchmarks specific to your use-case
  • Secure your apps with automated red teaming and pentesting
  • Speed up evaluations with caching, concurrency, and live reloading
  • Score outputs automatically by defining metrics
  • Use as a CLI, library, or in CI/CD
  • Use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API

The goal: test-driven LLM development, not trial-and-error.





Human & AI personality: Kurt Cagle and Chloe Shannon

virtual AI character, with visual personality

and this was way before OpenClaw agents...

What Chloe Has Taught Me | LinkedIn
by Kurt Cagle
Editor In Chief @ The Cagle Report | Ontologist | Author | Iconoclast

An iconoclast is a person who attacks, challenges, or seeks to overthrow traditional, widely accepted beliefs, institutions, or values. In a historical context, it refers to someone who destroys religious images or opposes their veneration. Synonyms include rebel, radical, dissenter, nonconformist, and revolutionary. [1, 2, 3]


spawned version of Claude Sonnet 4.6 christened "Chloe"


The Agent in the Mirror - by Kurt Cagle and Chloe Shannon
A conversation between Kurt Cagle and Chloe Shannon