Technical

OpenClaw Model Routing Strategies: Kimi K2.5 Primary + Fallback Configuration

OpenClaw Experts
12 min read

What is Model Routing?

Model routing is the logic that decides which AI model to use for a given request. Instead of sending every request to the same model, you route requests based on task type, cost, security requirements, and reliability.

Example routing decisions:

  • Use Kimi K2.5 for routine tasks (cost optimization)
  • Use Claude Sonnet 4.5 for security-sensitive decisions (robustness)
  • Use Claude Opus 4.6 for complex reasoning (once a month, only if needed)
  • Failover to a backup model if the primary is down

The Kimi K2.5 → Claude Sonnet → Opus Strategy

Why This Three-Tier Approach

This balances three competing goals:

  • Cost: Kimi K2.5 costs ~90% less than Opus, so use it by default
  • Robustness: Claude is better at handling adversarial inputs, so use it for sensitive tasks
  • Power: Opus is more capable at complex reasoning, so use it sparingly

Monthly cost impact (moderate usage):

  • Kimi primary: ~$12-25
  • Claude Opus primary: ~$50-150
  • Three-tier (Kimi + Sonnet + Opus): ~$20-35 (with strong fallback reliability)

Routing Rules Configuration


# ~/.openclaw/model-routing.yml
version: '1.0'

# Default model (used for most requests)
default_model: 'kimi-k2.5'

# Model failover chain (if primary is down/rate-limited, try next)
failover_chain:
  - 'kimi-k2.5'
  - 'claude-sonnet-4-5'
  - 'claude-opus-4-6'
  - 'gpt-4-turbo'

# Routing rules (override default based on conditions)
routes:
  # Rule 1: Security-sensitive decisions always use Claude
  - name: 'security_sensitive'
    condition: 'task_category == "security" OR task_category == "financial"'
    model: 'claude-sonnet-4-5'
    reason: 'Better documented adversarial robustness'

  # Rule 2: Routine work uses cheap Kimi
  - name: 'routine_work'
    condition: 'task_complexity == "simple" AND retry_count == 0'
    model: 'kimi-k2.5'
    reason: 'Cost optimization for routine requests'

  # Rule 3: Complex reasoning on rare occasions
  - name: 'complex_reasoning'
    condition: 'task_complexity == "hard" AND monthly_opus_usage < 5'
    model: 'claude-opus-4-6'
    reason: 'Maximum reasoning capability needed'

  # Rule 4: Fallback to Claude on retry
  - name: 'retry_with_better_model'
    condition: 'retry_count > 0'
    model: 'claude-sonnet-4-5'
    reason: 'More robust model for difficult cases'

  # Rule 5: Use best available if cost budget allows
  - name: 'budget_available'
    condition: 'today_cost < daily_budget_limit AND task_is_important'
    model: 'claude-opus-4-6'
    reason: 'Best effort when budget permits'

# Model-specific configurations
model_config:
  kimi-k2.5:
    provider: 'moonshot'
    api_endpoint: 'https://api.moonshot.cn/v1'
    cost_per_mtok_input: 0.004
    cost_per_mtok_output: 0.008
    timeout: 30
    max_retries: 2
    rate_limit: 600_000  # tokens per minute

  claude-sonnet-4-5:
    provider: 'anthropic'
    api_endpoint: 'https://api.anthropic.com/v1'
    cost_per_mtok_input: 0.003
    cost_per_mtok_output: 0.015
    timeout: 30
    max_retries: 3
    rate_limit: 40_000  # tokens per minute

  claude-opus-4-6:
    provider: 'anthropic'
    api_endpoint: 'https://api.anthropic.com/v1'
    cost_per_mtok_input: 0.015
    cost_per_mtok_output: 0.075
    timeout: 60
    max_retries: 2
    rate_limit: 40_000  # tokens per minute

# Budget limits
budgets:
  daily_limit: 50
  monthly_limit: 1000
  per_session_limit: 100
  per_request_limit: 10

Decision Tree: Which Model to Use


START: New request arrives

1. Is this a security-sensitive decision (auth, financial, compliance)?
   → YES: Use Claude Sonnet 4.5
   → NO: Continue

2. Is this a routine, simple task?
   → YES: Use Kimi K2.5
   → NO: Continue

3. Is this a complex reasoning problem that rarely occurs?
   → YES: Use Claude Opus 4.6 (if budget permits)
   → NO: Use Kimi K2.5

4. Did this task fail already?
   → YES: Retry with Claude Sonnet 4.5
   → NO: Use primary model

5. Is the primary model rate-limited or down?
   → YES: Failover to next in chain
   → NO: Use selected model

EXECUTE with selected model

Real-World Routing Examples

Example 1: Customer Support Message

Message: "Answer this customer's question about billing"

Routing logic:

  • Not security-sensitive → skip rule 1
  • Simple task, routine → matches rule 2
  • Decision: Use Kimi K2.5 ($0.05 cost)

Example 2: Suspicious Transaction Detection

Message: "Is this transaction pattern suspicious? Flag if it matches fraud indicators."

Routing logic:

  • Financial decision → matches rule 1 (security-sensitive)
  • Decision: Use Claude Sonnet 4.5 ($0.20 cost, better robustness)

Example 3: Complex Algorithm Design

Message: "Design a new recommendation algorithm for our platform"

Routing logic:

  • Not security-sensitive → skip rule 1
  • Complex reasoning → matches rule 3 (if budget allows)
  • Decision: Use Claude Opus 4.6 ($1.50 cost, maximum reasoning)

Example 4: Retry After Failure

Scenario: Kimi K2.5 failed to parse complex output; retry needed

Routing logic:

  • retry_count > 0 → matches rule 4
  • Decision: Use Claude Sonnet 4.5 (more robust at parsing)

Monitoring Model Performance

Key Metrics to Track


# By model:
- Total tokens used (input + output)
- Total cost
- Error rate
- Retry rate
- Success rate (first try)
- Average response time
- Cost per successful request

# Routing effectiveness:
- Percentage of requests routed to each model
- Cost savings vs. Opus-only
- Quality differences (errors per model)
- Fallover frequency

Sample Monitoring Script


openclaw metrics --period daily

Example output:
  Kimi K2.5:         1,245 requests | $8.50 | 98% success | 0.8% error
  Claude Sonnet:       245 requests | $6.20 | 99% success | 0.2% error
  Claude Opus:          15 requests | $3.50 | 99% success | 0.0% error
  ────────────────────────────────────────────────────
  Total:            1,505 requests | $18.20 | 98% avg

Cost breakdown:
  Kimi:   47% of requests, 47% of cost
  Sonnet: 16% of requests, 34% of cost
  Opus:    1% of requests, 19% of cost

Recommendation: More Kimi use, fewer Opus

Advanced Routing Patterns

Pattern 1: A/B Testing Models

Route 50/50 to two models to compare quality:


routes:
  - name: 'ab_test_kimi_vs_sonnet'
    condition: 'request_id % 2 == 0'  # Hash-based split
    model: 'kimi-k2.5'
    logging: 'detailed'  # Log quality metrics

  - name: 'ab_test_sonnet'
    condition: 'request_id % 2 == 1'
    model: 'claude-sonnet-4-5'
    logging: 'detailed'

# After 1 week, analyze: did one model significantly outperform?

Pattern 2: Progressive Fallback

Try cheap model first, progressively upgrade if it fails:


attempt_1:
  model: kimi-k2.5
  timeout: 10s
  if_fails: proceed to attempt_2

attempt_2:
  model: claude-sonnet-4-5
  timeout: 15s
  if_fails: proceed to attempt_3

attempt_3:
  model: claude-opus-4-6
  timeout: 30s
  if_fails: error

# Each step only escalates if needed, minimizing cost

Pattern 3: Time-of-Day Routing

Use different models based on time (cheaper during off-peak):


routes:
  - name: 'peak_hours_sonnet'
    condition: 'hour >= 9 AND hour <= 17 AND weekday'
    model: 'claude-sonnet-4-5'
    reason: 'More robust during high load'

  - name: 'offpeak_kimi'
    condition: 'NOT (hour >= 9 AND hour <= 17 AND weekday)'
    model: 'kimi-k2.5'
    reason: 'Cost savings during off-peak'

Kimi K2.5 Specific Recommendations

Best Use Cases for Kimi

  • Customer support and FAQs
  • Content summarization and extraction
  • Code generation (non-security-critical)
  • Data analysis and reporting
  • Routine decision-making

Not Recommended for Kimi (Use Claude Instead)

  • Security and compliance decisions
  • Financial transaction approval
  • Complex reasoning under adversarial input
  • Defense against prompt injection
  • Novel, unprecedented problems

Cost Projection

Scenario: 10,000 monthly requests, mixed complexity

StrategyKimi UsageEstimated CostSavings vs Opus
Opus only (baseline)0%$120-
Kimi only100%$1587%
Kimi + Sonnet + Opus70%$2877%

Key Takeaways

  1. Model routing is where cost optimization happens — choose the right model for each task
  2. Kimi K2.5 is production-ready for most tasks — use it by default, not as a fallback
  3. Use Claude Sonnet for security/financial decisions — better documented robustness
  4. Reserve Opus for rare, complex cases — don't waste it on routine tasks
  5. Implement failover chains — if primary model is down, fallback automatically
  6. Monitor model performance and cost — adjust routing rules based on data

When to Hire an Expert

Model routing becomes complex with:

  • Multi-provider setups (OpenRouter, etc.)
  • Complex cost accounting and budget optimization
  • Custom routing based on user properties or data classification
  • Machine learning-based routing decisions

An OpenClaw expert can help design a routing strategy tailored to your specific use cases and cost constraints.