percent · Higher = better
SWE-bench Verified
Human-verified subset of SWE-bench (500 issues); the current standard for serious coding-agent claims. Expected: Q3 2026.
Leaderboard
No runs yet.
Scores will appear once the first run completes. Vendor pre-notification is 48 hours; methodology is public.
Read the methodologyScores reflect the most recent run per tool. Historical runs are kept for trend tracking. Methodology is public. Corrections to hello@vybing.dev.