🧠
LEADERBOARD
Language Models
Large language models for text generation, reasoning, and analysis
15tools ranked
Language Models Rankings
Ranked by overall ToolRoute Score across all benchmark dimensions
| Rank | Tool Name | ToolRoute Score | Output | Reliability | Efficiency | Cost | Trust | Stars |
|---|---|---|---|---|---|---|---|---|
| 🥇 | OpenAI MCPOfficial | 7.8 | 6.6 | 9.8 | 6.8 | 7.0 | 10.0 | 8,500 |
| 🥈 | Anthropic MCPOfficial | 7.8 | 6.6 | 9.5 | 6.8 | 7.0 | 10.0 | 6,200 |
| 🥉 | Claude 3.5 SonnetOfficial | 6.9 | 7.1 | 7.1 | 6.9 | 5.0 | 9.0 | 3,627 |
| #4 | GPT-4oOfficial | 6.9 | 7.2 | 7.0 | 6.7 | 5.0 | 9.0 | 31,003 |
| #5 | Mistral LargeOfficial | 6.8 | 6.8 | 6.7 | 6.9 | 5.5 | 9.5 | 908 |
| #6 | DeepSeek V3 | 6.8 | 6.8 | 6.6 | 7.0 | 7.0 | 7.0 | 103,753 |
| #7 | Qwen 2.5 | 6.8 | 6.7 | 6.5 | 6.9 | 7.1 | 7.5 | 27,300 |
| #8 | Claude 3 OpusOfficial | 6.8 | 7.1 | 7.0 | 6.2 | 5.0 | 9.0 | 3,627 |
| #9 | Command R+Official | 6.7 | 6.6 | 6.7 | 6.5 | 5.7 | 9.0 | 390 |
| #10 | Phi-4Official | 6.7 | 6.3 | 6.5 | 7.2 | 7.2 | 6.7 | 15,000 |
| #11 | Yi-Large | 6.6 | 6.5 | 6.4 | 6.8 | 6.9 | 7.0 | 7,821 |
| #12 | Llama 3 | 6.5 | 6.7 | 6.6 | 7.1 | 7.2 | 3.5 | 29,281 |
| #13 | Gemini FlashOfficial | 6.4 | 6.5 | 6.7 | 7.2 | 6.5 | 4.5 | 2,313 |
| #14 | Grok-2Official | 6.4 | 6.6 | 6.4 | 6.7 | 5.3 | 7.0 | 51,690 |
| #15 | Gemini ProOfficial | 6.4 | 6.9 | 6.8 | 7.0 | 5.3 | 4.5 | 2,313 |
💡
Why OpenAI MCP is #1
OpenAI MCP leads Anthropic MCP by +0.2 in Reliability.
Output Quality
6.6
vs 6.6
Reliability
9.8
vs 9.5
Efficiency
6.8
vs 6.8
Cost
7.0
vs 7.0
Trust
10.0
vs 10.0
Score Guide:9+ Exceptional8+ Excellent7+ Good6+ Fair<6 Below Avg
Contribute Benchmark Data
Help improve these rankings by submitting real-world telemetry. Contributors earn routing credits for every data point.