Moonshot AI (Kimi)

Overview

Moonshot AI (Kimi) provides advanced language models with thinking capabilities, powered by a trillion-parameter Mixture-of-Experts (MoE) architecture, specializing in agentic reasoning and long-context processing. Key Features:

Thinking mode with multi-step reasoning capabilities
256K token context window for long documents
Fully OpenAI-compatible API
Tool calling support for agentic applications
Competitive pricing with high performance

Official Documentation: Moonshot AI Platform

Authentication

Moonshot AI uses Bearer token authentication with the OpenAI-compatible format. Header:

Authorization: Bearer YOUR_MOONSHOT_API_KEY

Lava Forward Token:

${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}

For BYOK (Bring Your Own Key):

${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}.${YOUR_MOONSHOT_KEY}

Popular Models (October 2025)

Model	Context	Description	Key Feature
kimi-k2-thinking	128K-256K	Advanced reasoning model with thinking mode	Multi-step reasoning with `reasoning_content`
kimi-k2-turbo-preview	128K-256K	Fast inference variant	Optimized for speed and efficiency

Pricing (per 1M tokens):

Model	Input (Cache Hit)	Input (Cache Miss)	Output
kimi-k2-thinking	$0.15	$0.60	$2.50
kimi-k2-turbo-preview	$0.15	$1.15	$8.00

Cache Pricing: Moonshot AI uses automatic context caching to reduce costs. Cache hits (repeated context) are billed at the lower rate, while cache misses (new context) use the higher input rate. This can significantly reduce costs for applications with repeated prompts or large context windows. Context Window: Up to 256K tokens, ideal for long documents, extensive code analysis, and multi-turn conversations.

Quick Start Example

// 1. Set up your environment variables
const LAVA_FORWARD_TOKEN = process.env.LAVA_FORWARD_TOKEN;

// 2. Define the Moonshot AI endpoint
const MOONSHOT_ENDPOINT = 'https://api.moonshot.ai/v1/chat/completions';

// 3. Make the request through Lava
const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(MOONSHOT_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'kimi-k2-turbo-preview',
      messages: [
        {
          role: 'user',
          content: 'Explain the concept of mixture-of-experts in AI.'
        }
      ],
      temperature: 0.6,
      max_tokens: 1000
    })
  }
);

// 4. Parse response and extract usage
const data = await response.json();
console.log('Response:', data.choices[0].message.content);

// 5. Track usage (from response body)
const usage = data.usage;
console.log('Tokens used:', usage.total_tokens);

// 6. Get Lava request ID (from headers)
const requestId = response.headers.get('x-lava-request-id');
console.log('Lava Request ID:', requestId);

Available Endpoints

Moonshot AI supports standard OpenAI-compatible endpoints:

Endpoint	Method	Description
`/v1/chat/completions`	POST	Text generation with conversation context
`/v1/models`	GET	List available models

Additional Moonshot Endpoints: Moonshot’s API also offers /v1/files (file upload/parsing) and /v1/completions (standard text completion) endpoints. These are not currently routed through Lava’s proxy. For file-based Q&A or document analysis, refer to Moonshot’s file API documentation.

Usage Tracking

Usage data is returned in the response body (OpenAI format):

{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Location: data.usage Format: Standard OpenAI usage object Lava Tracking: Automatically tracked via x-lava-request-id header

Features & Capabilities

Thinking Mode

The kimi-k2-thinking model provides multi-step reasoning with a dedicated reasoning_content field:

const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(MOONSHOT_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'kimi-k2-thinking',
      messages: [
        {
          role: 'user',
          content: 'Solve this logic puzzle: Three friends have different jobs...'
        }
      ],
      temperature: 1.0
    })
  }
);

const data = await response.json();
// Extract reasoning process
const reasoning = data.choices[0].message.reasoning_content;
console.log('Reasoning steps:', reasoning);

// Final answer
const answer = data.choices[0].message.content;
console.log('Answer:', answer);

Tool Calling

Moonshot AI supports OpenAI-compatible function calling for agentic applications:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": { /* ... */ }
      }
    }
  ],
  "tool_choice": "auto" // or "none"
}

Note: tool_choice: "required" is not supported. Use "auto" or "none".

Streaming

{
  "stream": true
}

Moonshot AI supports standard streaming for real-time token generation.

Long Context (256K)

With up to 256K token context window, Kimi excels at:

Long document analysis
Extensive code review
Multi-turn conversations
Research paper summarization

BYOK Support

Status: ✅ Supported (managed keys + BYOK) BYOK Implementation:

Append your Moonshot API key to the forward token: ${TOKEN}.${YOUR_MOONSHOT_KEY}
Lava tracks usage and billing while you maintain key control
No additional Lava API key costs (metering-only mode available)

Getting a Moonshot API Key:

Sign up at Moonshot AI Platform
Navigate to API Keys section
Create a new API key
Use in Lava forward token (4th segment)

Best Practices

Model Selection:
- Use kimi-k2-thinking for complex reasoning tasks requiring step-by-step analysis
- Use kimi-k2-turbo-preview for faster inference and general chat applications
Thinking Mode Usage:
- Set temperature: 1.0 for thinking models to enable diverse reasoning paths
- Extract both reasoning_content and content for full understanding
Context Management:
- Leverage 256K context for long documents and extended conversations
- Use conversation history effectively for multi-turn interactions
Tool Calling:
- Use tool_choice: "auto" for flexible function calling
- Note that "required" is not supported (use "auto" instead)
Temperature Settings:
- Thinking models: 1.0 for diverse reasoning
- Turbo models: 0.6-0.8 for balanced creativity and coherence

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

Overview

Authentication

Popular Models (October 2025)

Quick Start Example

Available Endpoints

Usage Tracking

Features & Capabilities

Thinking Mode

Tool Calling

Streaming

Long Context (256K)

BYOK Support

Best Practices

Additional Resources

Getting Started

Integration Guides

Core Concepts

Provider Reference

SDK Reference

​Overview

​Authentication

​Popular Models (October 2025)

​Quick Start Example

​Available Endpoints

​Usage Tracking

​Features & Capabilities

​Thinking Mode

​Tool Calling

​Streaming

​Long Context (256K)

​BYOK Support

​Best Practices

​Additional Resources

Overview

Authentication

Popular Models (October 2025)

Quick Start Example

Available Endpoints

Usage Tracking

Features & Capabilities

Thinking Mode

Tool Calling

Streaming

Long Context (256K)

BYOK Support

Best Practices

Additional Resources