Skip to main content

Language Model Providers

Lava supports all major LLM providers with full streaming support, function calling, and automatic usage tracking.

OpenAI

Models: GPT-4, GPT-4o, GPT-3.5-Turbo Key Features:
  • Chat completions with streaming
  • Function calling and tool use
  • Vision support (GPT-4 Vision)
  • Embeddings (text-embedding-3)
  • DALL-E image generation
Endpoint: https://api.openai.com/v1/chat/completions Usage Example:
const response = await fetch('https://api.lavapayments.com/v1/forward?u=https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${forwardToken}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }],
    stream: true
  })
});
Billing: Token-based (input + output tokens)

Anthropic

Models: Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku Key Features:
  • Long context windows (up to 200K tokens)
  • Vision support (analyze images)
  • Tool use (function calling)
  • System prompts and extended thinking
Endpoint: https://api.anthropic.com/v1/messages Billing: Token-based (input + output tokens, cached prompt tokens discounted)

Google

Models: Gemini Pro, Gemini Flash Key Features:
  • Multimodal inputs (text, images, video)
  • Large context windows
  • Code generation and execution
  • Function calling
Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent Billing: Token-based (input + output characters converted to tokens)

xAI

Models: Grok, Grok-beta Key Features:
  • Real-time data access
  • Advanced reasoning capabilities
  • OpenAI-compatible API
Endpoint: https://api.x.ai/v1/chat/completions Billing: Token-based (input + output tokens)

Mistral AI

Models: Mistral Large, Mistral Medium, Mistral Small, Mixtral Key Features:
  • Efficient inference
  • Multilingual support (French, Spanish, German, Italian)
  • Function calling
  • JSON mode
Endpoint: https://api.mistral.ai/v1/chat/completions Billing: Token-based (input + output tokens)

DeepSeek

Models: DeepSeek-Chat, DeepSeek-Coder Key Features:
  • Cost-effective inference
  • High performance on coding tasks
  • Long context support
Endpoint: https://api.deepseek.com/v1/chat/completions Billing: Token-based (input + output tokens, very competitive pricing)

Other LLM Providers

Groq: Ultra-fast LLM inference with low latency Cohere: Enterprise-focused with RAG and embeddings together.ai: Open-source models (Llama, Mixtral, etc.) Fireworks: Fast inference platform for open models DeepInfra: Serverless AI inference All use token-based billing and support streaming responses.

Streaming Support

All LLM providers support Server-Sent Events (SSE) streaming:
{
  "model": "gpt-4",
  "messages": [...],
  "stream": true  // Enable streaming
}
How Lava handles streaming:
  1. Request includes "stream": true
  2. Lava forwards request to provider
  3. Lava streams response chunks back in real-time
  4. Usage data extracted from final SSE message
  5. Billing happens after stream completes
Response headers:
x-lava-request-id: req_01234567890abcdef
The x-lava-request-id header is added to all responses for request tracking and debugging. Usage data comes from the response body (data.usage), not headers.

Provider-Specific Features

Function Calling

Supported by: OpenAI, Anthropic (tool use), Google, Mistral Example (OpenAI):
{
  "model": "gpt-4",
  "messages": [...],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "parameters": { ... }
      }
    }
  ]
}
Lava tracks function call usage and includes it in billing.

Vision (Multimodal)

Supported by: OpenAI (GPT-4 Vision), Anthropic (Claude 3), Google (Gemini) Example (OpenAI Vision):
{
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        { "type": "text", "text": "What's in this image?" },
        { "type": "image_url", "image_url": { "url": "https://..." } }
      ]
    }
  ]
}
Vision inputs are metered with separate token costs for image processing.

Next Steps