What is Model Routing?
Model routing is the logic that decides which AI model to use for a given request. Instead of sending every request to the same model, you route requests based on task type, cost, security requirements, and reliability.
Example routing decisions:
- Use Kimi K2.5 for routine tasks (cost optimization)
- Use Claude Sonnet 4.5 for security-sensitive decisions (robustness)
- Use Claude Opus 4.6 for complex reasoning (once a month, only if needed)
- Failover to a backup model if the primary is down
The Kimi K2.5 → Claude Sonnet → Opus Strategy
Why This Three-Tier Approach
This balances three competing goals:
- Cost: Kimi K2.5 costs ~90% less than Opus, so use it by default
- Robustness: Claude is better at handling adversarial inputs, so use it for sensitive tasks
- Power: Opus is more capable at complex reasoning, so use it sparingly
Monthly cost impact (moderate usage):
- Kimi primary: ~$12-25
- Claude Opus primary: ~$50-150
- Three-tier (Kimi + Sonnet + Opus): ~$20-35 (with strong fallback reliability)
Routing Rules Configuration
# ~/.openclaw/model-routing.yml
version: '1.0'
# Default model (used for most requests)
default_model: 'kimi-k2.5'
# Model failover chain (if primary is down/rate-limited, try next)
failover_chain:
- 'kimi-k2.5'
- 'claude-sonnet-4-5'
- 'claude-opus-4-6'
- 'gpt-4-turbo'
# Routing rules (override default based on conditions)
routes:
# Rule 1: Security-sensitive decisions always use Claude
- name: 'security_sensitive'
condition: 'task_category == "security" OR task_category == "financial"'
model: 'claude-sonnet-4-5'
reason: 'Better documented adversarial robustness'
# Rule 2: Routine work uses cheap Kimi
- name: 'routine_work'
condition: 'task_complexity == "simple" AND retry_count == 0'
model: 'kimi-k2.5'
reason: 'Cost optimization for routine requests'
# Rule 3: Complex reasoning on rare occasions
- name: 'complex_reasoning'
condition: 'task_complexity == "hard" AND monthly_opus_usage < 5'
model: 'claude-opus-4-6'
reason: 'Maximum reasoning capability needed'
# Rule 4: Fallback to Claude on retry
- name: 'retry_with_better_model'
condition: 'retry_count > 0'
model: 'claude-sonnet-4-5'
reason: 'More robust model for difficult cases'
# Rule 5: Use best available if cost budget allows
- name: 'budget_available'
condition: 'today_cost < daily_budget_limit AND task_is_important'
model: 'claude-opus-4-6'
reason: 'Best effort when budget permits'
# Model-specific configurations
model_config:
kimi-k2.5:
provider: 'moonshot'
api_endpoint: 'https://api.moonshot.cn/v1'
cost_per_mtok_input: 0.004
cost_per_mtok_output: 0.008
timeout: 30
max_retries: 2
rate_limit: 600_000 # tokens per minute
claude-sonnet-4-5:
provider: 'anthropic'
api_endpoint: 'https://api.anthropic.com/v1'
cost_per_mtok_input: 0.003
cost_per_mtok_output: 0.015
timeout: 30
max_retries: 3
rate_limit: 40_000 # tokens per minute
claude-opus-4-6:
provider: 'anthropic'
api_endpoint: 'https://api.anthropic.com/v1'
cost_per_mtok_input: 0.015
cost_per_mtok_output: 0.075
timeout: 60
max_retries: 2
rate_limit: 40_000 # tokens per minute
# Budget limits
budgets:
daily_limit: 50
monthly_limit: 1000
per_session_limit: 100
per_request_limit: 10
Decision Tree: Which Model to Use
START: New request arrives
1. Is this a security-sensitive decision (auth, financial, compliance)?
→ YES: Use Claude Sonnet 4.5
→ NO: Continue
2. Is this a routine, simple task?
→ YES: Use Kimi K2.5
→ NO: Continue
3. Is this a complex reasoning problem that rarely occurs?
→ YES: Use Claude Opus 4.6 (if budget permits)
→ NO: Use Kimi K2.5
4. Did this task fail already?
→ YES: Retry with Claude Sonnet 4.5
→ NO: Use primary model
5. Is the primary model rate-limited or down?
→ YES: Failover to next in chain
→ NO: Use selected model
EXECUTE with selected model
Real-World Routing Examples
Example 1: Customer Support Message
Message: "Answer this customer's question about billing"
Routing logic:
- Not security-sensitive → skip rule 1
- Simple task, routine → matches rule 2
- Decision: Use Kimi K2.5 ($0.05 cost)
Example 2: Suspicious Transaction Detection
Message: "Is this transaction pattern suspicious? Flag if it matches fraud indicators."
Routing logic:
- Financial decision → matches rule 1 (security-sensitive)
- Decision: Use Claude Sonnet 4.5 ($0.20 cost, better robustness)
Example 3: Complex Algorithm Design
Message: "Design a new recommendation algorithm for our platform"
Routing logic:
- Not security-sensitive → skip rule 1
- Complex reasoning → matches rule 3 (if budget allows)
- Decision: Use Claude Opus 4.6 ($1.50 cost, maximum reasoning)
Example 4: Retry After Failure
Scenario: Kimi K2.5 failed to parse complex output; retry needed
Routing logic:
- retry_count > 0 → matches rule 4
- Decision: Use Claude Sonnet 4.5 (more robust at parsing)
Monitoring Model Performance
Key Metrics to Track
# By model:
- Total tokens used (input + output)
- Total cost
- Error rate
- Retry rate
- Success rate (first try)
- Average response time
- Cost per successful request
# Routing effectiveness:
- Percentage of requests routed to each model
- Cost savings vs. Opus-only
- Quality differences (errors per model)
- Fallover frequency
Sample Monitoring Script
openclaw metrics --period daily
Example output:
Kimi K2.5: 1,245 requests | $8.50 | 98% success | 0.8% error
Claude Sonnet: 245 requests | $6.20 | 99% success | 0.2% error
Claude Opus: 15 requests | $3.50 | 99% success | 0.0% error
────────────────────────────────────────────────────
Total: 1,505 requests | $18.20 | 98% avg
Cost breakdown:
Kimi: 47% of requests, 47% of cost
Sonnet: 16% of requests, 34% of cost
Opus: 1% of requests, 19% of cost
Recommendation: More Kimi use, fewer Opus
Advanced Routing Patterns
Pattern 1: A/B Testing Models
Route 50/50 to two models to compare quality:
routes:
- name: 'ab_test_kimi_vs_sonnet'
condition: 'request_id % 2 == 0' # Hash-based split
model: 'kimi-k2.5'
logging: 'detailed' # Log quality metrics
- name: 'ab_test_sonnet'
condition: 'request_id % 2 == 1'
model: 'claude-sonnet-4-5'
logging: 'detailed'
# After 1 week, analyze: did one model significantly outperform?
Pattern 2: Progressive Fallback
Try cheap model first, progressively upgrade if it fails:
attempt_1:
model: kimi-k2.5
timeout: 10s
if_fails: proceed to attempt_2
attempt_2:
model: claude-sonnet-4-5
timeout: 15s
if_fails: proceed to attempt_3
attempt_3:
model: claude-opus-4-6
timeout: 30s
if_fails: error
# Each step only escalates if needed, minimizing cost
Pattern 3: Time-of-Day Routing
Use different models based on time (cheaper during off-peak):
routes:
- name: 'peak_hours_sonnet'
condition: 'hour >= 9 AND hour <= 17 AND weekday'
model: 'claude-sonnet-4-5'
reason: 'More robust during high load'
- name: 'offpeak_kimi'
condition: 'NOT (hour >= 9 AND hour <= 17 AND weekday)'
model: 'kimi-k2.5'
reason: 'Cost savings during off-peak'
Kimi K2.5 Specific Recommendations
Best Use Cases for Kimi
- Customer support and FAQs
- Content summarization and extraction
- Code generation (non-security-critical)
- Data analysis and reporting
- Routine decision-making
Not Recommended for Kimi (Use Claude Instead)
- Security and compliance decisions
- Financial transaction approval
- Complex reasoning under adversarial input
- Defense against prompt injection
- Novel, unprecedented problems
Cost Projection
Scenario: 10,000 monthly requests, mixed complexity
| Strategy | Kimi Usage | Estimated Cost | Savings vs Opus |
|---|---|---|---|
| Opus only (baseline) | 0% | $120 | - |
| Kimi only | 100% | $15 | 87% |
| Kimi + Sonnet + Opus | 70% | $28 | 77% |
Key Takeaways
- Model routing is where cost optimization happens — choose the right model for each task
- Kimi K2.5 is production-ready for most tasks — use it by default, not as a fallback
- Use Claude Sonnet for security/financial decisions — better documented robustness
- Reserve Opus for rare, complex cases — don't waste it on routine tasks
- Implement failover chains — if primary model is down, fallback automatically
- Monitor model performance and cost — adjust routing rules based on data
When to Hire an Expert
Model routing becomes complex with:
- Multi-provider setups (OpenRouter, etc.)
- Complex cost accounting and budget optimization
- Custom routing based on user properties or data classification
- Machine learning-based routing decisions
An OpenClaw expert can help design a routing strategy tailored to your specific use cases and cost constraints.