Skip to main content
vybing.dev
Try:
Home

Standardised testing

Benchmarks

We run a consistent suite of tasks across tools. Methodology is public; vendor pre-notification is 48 hours. Scores update as new runs land.

Read the methodology

Aider Polyglot

Code-editing accuracy across C, Python, TypeScript, JavaScript, and Rust. Methodology by Paul Gauthier (Aider AI), licensed for public reference. Scores represent each model's performance on Aider's polyglot coding test harness.

Unit: percentHigher = betterLast run: May 11, 2026
Methodology

SWE-bench Verified

Human-verified subset of SWE-bench (500 issues); the current standard for serious coding-agent claims. Expected: Q3 2026.

Unit: percentHigher = better
Methodology

First run · Q3 2026

Scores publish after methodology lock and a complete cross-tool run.