Quick Reference
- Base URL:
https://generativelanguage.googleapis.com/v1beta - Authentication: Bearer token (OpenAI endpoint) OR
x-goog-api-key(native endpoint) - API Format: Dual support - OpenAI-compatible AND native Google format
- Usage Tracking:
data.usage(OpenAI endpoint) ORdata.usageMetadata(native) - BYOK Support: ✓ Supported
Current Models (October 2025)
Gemini 2.5 Series (Latest)
- gemini-2.5-pro - State-of-the-art thinking model with 1M token context
- gemini-2.5-flash - Best price/performance balance, 1M context
- gemini-2.5-flash-lite - Fastest, most cost-efficient option
- gemini-2.5-flash-image - Image generation support
- gemini-2.5-pro-preview-tts - Text-to-speech capabilities
- gemini-2.5-flash-preview-tts - TTS (faster variant)
Gemini 2.0 Series
- gemini-2.0-flash - Second generation workhorse model
- gemini-2.0-flash-lite - Small, efficient workhorse
- gemini-2.0-flash-preview-image-generation - Image generation
Integration Example
Google Gemini supports two API formats: OpenAI-compatible (recommended for simplicity) and native Google format (for advanced features).Prerequisites
-
Get your Lava forward token:
- Navigate to Build > Secret Keys
- Find “Self Forward Token” section
- Click copy icon
- Set up environment variables:
.env.local
- Run from backend server (CORS blocks frontend requests for security)
Option 1: OpenAI-Compatible Endpoint (Recommended)
Option 2: Native Google Format
Request/Response Formats
OpenAI-Compatible Format
Request:Native Google Format
Request:Key Features
Multi-Modal Capabilities
- Text, images, video, audio: Send multiple content types in single request
- 1M token context: Process extremely long documents (Gemini 2.5 Pro/Flash)
- Vision understanding: Analyze images and diagrams
Advanced Features
- Thinking mode: Extended reasoning for complex tasks
- Code execution: Built-in code interpreter for mathematical computations
- Google Search grounding: Real-time web search integration
- Google Maps grounding: Location-based context and information
Context and Limits
- Input: Up to 1,048,576 tokens (1M context window)
- Output: 8,192 to 65,536 tokens (model-dependent)
Usage Tracking
OpenAI-Compatible Endpoint
Usage data is available in the response body atdata.usage:
Native Endpoint
Usage data is available atdata.usageMetadata: