percent · Higher = better
Terminal-Bench
Terminal agent benchmark from Stanford / laude-institute. Measures end-to-end agent accuracy across Linux terminal tasks.
Leaderboard
| Rank | Tool | Score | Run date |
|---|---|---|---|
| 01 | Claude Code | 27.5% | Jun 1, 2026 |
Scores reflect the most recent run per tool. Historical runs are kept for trend tracking. Methodology is public. Corrections to hello@vybing.dev.