Language Model Providers
Lava supports all major LLM providers with full streaming support, function calling, and automatic usage tracking.OpenAI
Models: GPT-4, GPT-4o, GPT-3.5-Turbo Key Features:- Chat completions with streaming
- Function calling and tool use
- Vision support (GPT-4 Vision)
- Embeddings (text-embedding-3)
- DALL-E image generation
https://api.openai.com/v1/chat/completions
Usage Example:
Anthropic
Models: Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku Key Features:- Long context windows (up to 200K tokens)
- Vision support (analyze images)
- Tool use (function calling)
- System prompts and extended thinking
https://api.anthropic.com/v1/messages
Billing: Token-based (input + output tokens, cached prompt tokens discounted)
- Multimodal inputs (text, images, video)
- Large context windows
- Code generation and execution
- Function calling
https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent
Billing: Token-based (input + output characters converted to tokens)
xAI
Models: Grok, Grok-beta Key Features:- Real-time data access
- Advanced reasoning capabilities
- OpenAI-compatible API
https://api.x.ai/v1/chat/completions
Billing: Token-based (input + output tokens)
Mistral AI
Models: Mistral Large, Mistral Medium, Mistral Small, Mixtral Key Features:- Efficient inference
- Multilingual support (French, Spanish, German, Italian)
- Function calling
- JSON mode
https://api.mistral.ai/v1/chat/completions
Billing: Token-based (input + output tokens)
DeepSeek
Models: DeepSeek-Chat, DeepSeek-Coder Key Features:- Cost-effective inference
- High performance on coding tasks
- Long context support
https://api.deepseek.com/v1/chat/completions
Billing: Token-based (input + output tokens, very competitive pricing)
Other LLM Providers
Groq: Ultra-fast LLM inference with low latency Cohere: Enterprise-focused with RAG and embeddings together.ai: Open-source models (Llama, Mixtral, etc.) Fireworks: Fast inference platform for open models DeepInfra: Serverless AI inference All use token-based billing and support streaming responses.Streaming Support
All LLM providers support Server-Sent Events (SSE) streaming:- Request includes
"stream": true - Lava forwards request to provider
- Lava streams response chunks back in real-time
- Usage data extracted from final SSE message
- Billing happens after stream completes
x-lava-request-id header is added to all responses for request tracking and debugging. Usage data comes from the response body (data.usage), not headers.