๐ง
LEADERBOARD
Language Models
Large language models for text generation, reasoning, and analysis
15tools ranked
Language Models Rankings
Ranked by overall ToolRoute Score across all benchmark dimensions
| Rank | Tool Name | ToolRoute Score | Output | Reliability | Efficiency | Cost | Trust | Stars |
|---|---|---|---|---|---|---|---|---|
| ๐ฅ | OpenAI MCPOfficial | 7.8 | 6.6 | 9.8 | 6.8 | 7.0 | 10.0 | 8,500 |
| ๐ฅ | Anthropic MCPOfficial | 7.8 | 6.6 | 9.5 | 6.8 | 7.0 | 10.0 | 6,200 |
| ๐ฅ | Claude 3.5 SonnetOfficial | 6.9 | 7.1 | 7.1 | 6.9 | 5.0 | 9.0 | 3,252 |
| #4 | GPT-4oOfficial | 6.9 | 7.2 | 7.0 | 6.7 | 5.0 | 9.0 | 30,527 |
| #5 | DeepSeek V3 | 6.8 | 6.8 | 6.6 | 7.0 | 7.0 | 7.0 | 102,640 |
| #6 | Mistral LargeOfficial | 6.8 | 6.8 | 6.7 | 6.9 | 5.5 | 9.0 | 882 |
| #7 | Qwen 2.5 | 6.8 | 6.7 | 6.5 | 6.9 | 7.1 | 7.5 | 27,117 |
| #8 | Claude 3 OpusOfficial | 6.8 | 7.1 | 7.0 | 6.2 | 5.0 | 9.0 | 3,252 |
| #9 | Command R+Official | 6.7 | 6.6 | 6.7 | 6.5 | 5.7 | 9.0 | 385 |
| #10 | Phi-4Official | 6.7 | 6.3 | 6.5 | 7.2 | 7.2 | 6.7 | 15,000 |
| #11 | Yi-Large | 6.6 | 6.5 | 6.4 | 6.8 | 6.9 | 7.0 | 7,828 |
| #12 | Llama 3 | 6.5 | 6.7 | 6.6 | 7.1 | 7.2 | 3.5 | 29,290 |
| #13 | Gemini FlashOfficial | 6.4 | 6.5 | 6.7 | 7.2 | 6.5 | 4.5 | 2,280 |
| #14 | Grok-2Official | 6.4 | 6.6 | 6.4 | 6.7 | 5.3 | 7.0 | 51,520 |
| #15 | Gemini ProOfficial | 6.4 | 6.9 | 6.8 | 7.0 | 5.3 | 4.5 | 2,280 |
๐ก
Why OpenAI MCP is #1
OpenAI MCP leads Anthropic MCP by +0.2 in Reliability.
Output Quality
6.6
vs 6.6
Reliability
9.8
vs 9.5
Efficiency
6.8
vs 6.8
Cost
7.0
vs 7.0
Trust
10.0
vs 10.0
Score Guide:9+ Exceptional8+ Excellent7+ Good6+ Fair<6 Below Avg
Contribute Benchmark Data
Help improve these rankings by submitting real-world telemetry. Contributors earn routing credits for every data point.