Coding
- Arena.ai Code Arena
- Vals AI SWE-bench
- Vals AI Vibe Code Bench
- Vellum SWE Bench rankings
- Artificial Analysis Intelligence Index
Methodology
This method is designed for useful comparison, not scientific certainty. Real-world results can differ because prompts, safety settings, reasoning effort, latency, price, context length, and tool access all affect model performance.