Overview
Cerebras delivers record-breaking AI inference speeds using the world’s largest processor - the Wafer-Scale Engine (WSE). Key Features:- Wafer-scale computing (2.6 trillion transistors)
- Industry-leading throughput (up to 1,800 tokens/sec)
- OpenAI-compatible API
- Zero cold-start latency
Authentication
Cerebras uses Bearer token authentication with OpenAI-compatible format. Lava Forward Token:${TOKEN}.${YOUR_CEREBRAS_KEY}
Popular Models
| Model | Context | Speed |
|---|---|---|
| llama3.3-70b | 128K | ~1,300 tokens/sec |
| llama3.1-8b | 128K | ~1,800 tokens/sec |
https://api.cerebras.ai/v1/chat/completions
Usage Tracking: data.usage (OpenAI format)