Skip to main content
vybing.dev
Try:
Home/Directory/AI Evals & Testing

AI Evals & Testing

Frameworks for benchmarking and regression-testing model outputs (Promptfoo, Braintrust, Patronus).

All tools3 tools
  1. appworld

    StonyBrookNLP

    🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource Paper.

  2. promptfoo

    promptfoo

    Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

  3. trulens

    truera

    Evaluation and Tracking for LLM Experiments and AI Agents