Skip to main content

Overview

Moonshot AI (Kimi) provides advanced language models with thinking capabilities, powered by a trillion-parameter Mixture-of-Experts (MoE) architecture, specializing in agentic reasoning and long-context processing. Key Features:
  • Thinking mode with multi-step reasoning capabilities
  • 256K token context window for long documents
  • Fully OpenAI-compatible API
  • Tool calling support for agentic applications
  • Competitive pricing with high performance
Official Documentation: Moonshot AI Platform

Authentication

Moonshot AI uses Bearer token authentication with the OpenAI-compatible format. Header:
Authorization: Bearer YOUR_MOONSHOT_API_KEY
Lava Forward Token:
${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}
For BYOK (Bring Your Own Key):
${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}.${YOUR_MOONSHOT_KEY}

ModelContextDescriptionKey Feature
kimi-k2-thinking128K-256KAdvanced reasoning model with thinking modeMulti-step reasoning with reasoning_content
kimi-k2-turbo-preview128K-256KFast inference variantOptimized for speed and efficiency
Pricing (per 1M tokens):
ModelInput (Cache Hit)Input (Cache Miss)Output
kimi-k2-thinking$0.15$0.60$2.50
kimi-k2-turbo-preview$0.15$1.15$8.00
Cache Pricing: Moonshot AI uses automatic context caching to reduce costs. Cache hits (repeated context) are billed at the lower rate, while cache misses (new context) use the higher input rate. This can significantly reduce costs for applications with repeated prompts or large context windows. Context Window: Up to 256K tokens, ideal for long documents, extensive code analysis, and multi-turn conversations.

Quick Start Example

// 1. Set up your environment variables
const LAVA_FORWARD_TOKEN = process.env.LAVA_FORWARD_TOKEN;

// 2. Define the Moonshot AI endpoint
const MOONSHOT_ENDPOINT = 'https://api.moonshot.ai/v1/chat/completions';

// 3. Make the request through Lava
const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(MOONSHOT_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'kimi-k2-turbo-preview',
      messages: [
        {
          role: 'user',
          content: 'Explain the concept of mixture-of-experts in AI.'
        }
      ],
      temperature: 0.6,
      max_tokens: 1000
    })
  }
);

// 4. Parse response and extract usage
const data = await response.json();
console.log('Response:', data.choices[0].message.content);

// 5. Track usage (from response body)
const usage = data.usage;
console.log('Tokens used:', usage.total_tokens);

// 6. Get Lava request ID (from headers)
const requestId = response.headers.get('x-lava-request-id');
console.log('Lava Request ID:', requestId);

Available Endpoints

Moonshot AI supports standard OpenAI-compatible endpoints:
EndpointMethodDescription
/v1/chat/completionsPOSTText generation with conversation context
/v1/modelsGETList available models
Additional Moonshot Endpoints: Moonshot’s API also offers /v1/files (file upload/parsing) and /v1/completions (standard text completion) endpoints. These are not currently routed through Lava’s proxy. For file-based Q&A or document analysis, refer to Moonshot’s file API documentation.

Usage Tracking

Usage data is returned in the response body (OpenAI format):
{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}
Location: data.usage Format: Standard OpenAI usage object Lava Tracking: Automatically tracked via x-lava-request-id header

Features & Capabilities

Thinking Mode

The kimi-k2-thinking model provides multi-step reasoning with a dedicated reasoning_content field:
const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(MOONSHOT_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'kimi-k2-thinking',
      messages: [
        {
          role: 'user',
          content: 'Solve this logic puzzle: Three friends have different jobs...'
        }
      ],
      temperature: 1.0
    })
  }
);

const data = await response.json();
// Extract reasoning process
const reasoning = data.choices[0].message.reasoning_content;
console.log('Reasoning steps:', reasoning);

// Final answer
const answer = data.choices[0].message.content;
console.log('Answer:', answer);

Tool Calling

Moonshot AI supports OpenAI-compatible function calling for agentic applications:
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": { /* ... */ }
      }
    }
  ],
  "tool_choice": "auto" // or "none"
}
Note: tool_choice: "required" is not supported. Use "auto" or "none".

Streaming

{
  "stream": true
}
Moonshot AI supports standard streaming for real-time token generation.

Long Context (256K)

With up to 256K token context window, Kimi excels at:
  • Long document analysis
  • Extensive code review
  • Multi-turn conversations
  • Research paper summarization

BYOK Support

Status: ✅ Supported (managed keys + BYOK) BYOK Implementation:
  • Append your Moonshot API key to the forward token: ${TOKEN}.${YOUR_MOONSHOT_KEY}
  • Lava tracks usage and billing while you maintain key control
  • No additional Lava API key costs (metering-only mode available)
Getting a Moonshot API Key:
  1. Sign up at Moonshot AI Platform
  2. Navigate to API Keys section
  3. Create a new API key
  4. Use in Lava forward token (4th segment)

Best Practices

  1. Model Selection:
    • Use kimi-k2-thinking for complex reasoning tasks requiring step-by-step analysis
    • Use kimi-k2-turbo-preview for faster inference and general chat applications
  2. Thinking Mode Usage:
    • Set temperature: 1.0 for thinking models to enable diverse reasoning paths
    • Extract both reasoning_content and content for full understanding
  3. Context Management:
    • Leverage 256K context for long documents and extended conversations
    • Use conversation history effectively for multi-turn interactions
  4. Tool Calling:
    • Use tool_choice: "auto" for flexible function calling
    • Note that "required" is not supported (use "auto" instead)
  5. Temperature Settings:
    • Thinking models: 1.0 for diverse reasoning
    • Turbo models: 0.6-0.8 for balanced creativity and coherence

Additional Resources