Skip to main content
vybing.dev
Try:
All benchmarks

percent · Higher = better

Terminal-Bench

Terminal agent benchmark from Stanford / laude-institute. Measures end-to-end agent accuracy across Linux terminal tasks.

Leaderboard

RankToolScoreRun date
01Claude Code27.5%Jun 1, 2026

Scores reflect the most recent run per tool. Historical runs are kept for trend tracking. Methodology is public. Corrections to hello@vybing.dev.