Skip to main content

Overview

Cerebras delivers record-breaking AI inference speeds using the world’s largest processor - the Wafer-Scale Engine (WSE). Key Features:
  • Wafer-scale computing (2.6 trillion transistors)
  • Industry-leading throughput (up to 1,800 tokens/sec)
  • OpenAI-compatible API
  • Zero cold-start latency
Official Documentation: Cerebras Inference Docs

Authentication

Cerebras uses Bearer token authentication with OpenAI-compatible format. Lava Forward Token:
${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}
For BYOK: ${TOKEN}.${YOUR_CEREBRAS_KEY}
ModelContextSpeed
llama3.3-70b128K~1,300 tokens/sec
llama3.1-8b128K~1,800 tokens/sec
Endpoint: https://api.cerebras.ai/v1/chat/completions Usage Tracking: data.usage (OpenAI format)

Quick Start

const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent('https://api.cerebras.ai/v1/chat/completions')}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'llama3.3-70b',
      messages: [{role: 'user', content: 'Hello!'}]
    })
  }
);

const data = await response.json();
console.log('Usage:', data.usage.total_tokens);

BYOK Support

Supported - Get API key from Cerebras Cloud

Resources