percent · Higher = better

Terminal-Bench

Terminal agent benchmark from Stanford / laude-institute. Measures end-to-end agent accuracy across Linux terminal tasks.

Leaderboard

Rank	Tool	Score	Run date
01	Claude Code	27.5%	Jun 1, 2026

Scores reflect the most recent run per tool. Historical runs are kept for trend tracking. Methodology is public. Corrections to hello@vybing.dev.