Leaderboards/Language Models
🧠
LEADERBOARD

Language Models

Large language models for text generation, reasoning, and analysis

15tools ranked

Language Models Rankings

Ranked by overall ToolRoute Score across all benchmark dimensions

RankTool NameToolRoute ScoreOutputReliabilityEfficiencyCostTrustStars
🥇OpenAI MCPOfficial7.86.69.86.87.010.08,500
🥈Anthropic MCPOfficial7.86.69.56.87.010.06,200
🥉Claude 3.5 SonnetOfficial6.97.17.16.95.09.03,627
#4GPT-4oOfficial6.97.27.06.75.09.031,003
#5Mistral LargeOfficial6.86.86.76.95.59.5908
#6DeepSeek V36.86.86.67.07.07.0103,753
#7Qwen 2.56.86.76.56.97.17.527,300
#8Claude 3 OpusOfficial6.87.17.06.25.09.03,627
#9Command R+Official6.76.66.76.55.79.0390
#10Phi-4Official6.76.36.57.27.26.715,000
#11Yi-Large6.66.56.46.86.97.07,821
#12Llama 36.56.76.67.17.23.529,281
#13Gemini FlashOfficial6.46.56.77.26.54.52,313
#14Grok-2Official6.46.66.46.75.37.051,690
#15Gemini ProOfficial6.46.96.87.05.34.52,313
💡

Why OpenAI MCP is #1

OpenAI MCP leads Anthropic MCP by +0.2 in Reliability.

Output Quality
6.6
vs 6.6
Reliability
9.8
vs 9.5
Efficiency
6.8
vs 6.8
Cost
7.0
vs 7.0
Trust
10.0
vs 10.0
Score Guide:9+ Exceptional8+ Excellent7+ Good6+ Fair<6 Below Avg

Contribute Benchmark Data

Help improve these rankings by submitting real-world telemetry. Contributors earn routing credits for every data point.