Skip to main content

What You’ll Learn

This guide shows you how to work with streaming responses from AI providers through Lava’s forward proxy. You’ll learn to:
  • Enable streaming for LLM requests
  • Parse Server-Sent Events (SSE) in JavaScript/TypeScript
  • Extract usage data from completed streams
  • Handle provider-specific streaming formats (OpenAI vs Anthropic)
Streaming provides real-time UX. Instead of waiting for the entire response, users see tokens appear incrementally, creating a more interactive chat-like experience. Lava fully supports streaming for all LLM providers.

Enabling Streaming

Basic Streaming Request

Enable streaming by adding "stream": true to your request body:
const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${forwardToken}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'gpt-4',
    messages: [
      { role: 'user', content: 'Explain quantum computing in simple terms' }
    ],
    stream: true  // Enable streaming
  })
});

// Response will be a stream of Server-Sent Events (SSE)

How Lava Handles Streaming

  1. Request Detection: Lava detects "stream": true in request body
  2. Forward to Provider: Request forwarded to AI provider with streaming enabled
  3. Real-time Proxy: Lava streams response chunks back to your client in real-time
  4. Usage Extraction: Usage data extracted from final SSE message (provider-specific format)
  5. Billing Completion: Billing happens after stream completes, headers added to final chunk
Streaming adds no latency. Lava’s proxy forwards Server-Sent Events in real-time without buffering, maintaining the same performance as calling providers directly.

Parsing Server-Sent Events

Modern approach using fetch with response body reader:
async function streamCompletion(messages: Array<{ role: string; content: string }>) {
  const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: messages,
      stream: true
    })
  });

  if (!response.ok) {
    throw new Error(`HTTP error! status: ${response.status}`);
  }

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  if (!reader) {
    throw new Error('Response body is null');
  }

  let buffer = '';

  while (true) {
    const { done, value } = await reader.read();

    if (done) {
      break;
    }

    // Decode chunk and add to buffer
    buffer += decoder.decode(value, { stream: true });

    // Process complete SSE messages
    const lines = buffer.split('\n');
    buffer = lines.pop() || '';  // Keep incomplete line in buffer

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);  // Remove 'data: ' prefix

        if (data === '[DONE]') {
          console.log('Stream complete');
          continue;
        }

        try {
          const parsed = JSON.parse(data);
          const content = parsed.choices[0]?.delta?.content;

          if (content) {
            process.stdout.write(content);  // Print token incrementally
          }
        } catch (err) {
          console.error('Failed to parse SSE data:', err);
        }
      }
    }
  }
}

// Usage
await streamCompletion([
  { role: 'user', content: 'Write a haiku about code' }
]);

Using EventSource (Browser Only)

For browser environments, EventSource provides simpler SSE handling:
// Note: EventSource doesn't support custom headers or POST requests
// Use fetch API for Lava streaming (requires Authorization header)

// EventSource is NOT recommended for Lava due to header limitations
// Use fetch API approach shown above instead
Use fetch API, not EventSource. EventSource doesn’t support custom headers (needed for Authorization), making it unsuitable for authenticated Lava requests. Always use fetch with response body reader.

Extracting Usage from Streams

Usage Headers (Final Chunk)

Lava adds usage headers to the final streamed chunk:
async function streamWithUsage(messages: any[]) {
  const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${forwardToken}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: messages,
      stream: true
    })
  });

  const reader = response.body?.getReader();
  const decoder = new TextDecoder();

  let finalChunkHeaders: Headers | null = null;
  let buffer = '';

  while (true) {
    const { done, value } = await reader!.read();

    if (done) {
      // Extract usage from response headers (available after stream completes)
      finalChunkHeaders = response.headers;
      break;
    }

    buffer += decoder.decode(value, { stream: true });
    // ... process chunks ...
  }

  // Access request tracking
  const requestId = finalChunkHeaders?.get('x-lava-request-id');

  console.log('Request tracking:', {
    requestId
  });
}

Provider Usage Data (SSE Messages)

Some providers include usage in the final SSE message: OpenAI Format:
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":120,"total_tokens":135}}
Anthropic Format:
data: {"type":"message_stop","usage":{"input_tokens":15,"output_tokens":120}}
Extracting provider usage:
for (const line of lines) {
  if (line.startsWith('data: ')) {
    const data = line.slice(6);

    if (data === '[DONE]') continue;

    const parsed = JSON.parse(data);

    // OpenAI usage
    if (parsed.usage) {
      console.log('Provider usage:', parsed.usage);
    }

    // Anthropic usage
    if (parsed.type === 'message_stop' && parsed.usage) {
      console.log('Provider usage:', parsed.usage);
    }
  }
}
Lava headers are authoritative. While provider SSE messages may include usage data, always use Lava’s X-Lava-Usage-* headers for billing calculations. These reflect actual charges including merchant fees and service charges.

Provider-Specific Considerations

OpenAI Streaming Format

Chunk Structure:
data: {"id":"chatcmpl-xyz","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
Key Differences:
  • Content in choices[0].delta.content
  • Final chunk includes usage object
  • Stream ends with data: [DONE]
Example:
const content = parsed.choices[0]?.delta?.content;
const finishReason = parsed.choices[0]?.finish_reason;

if (content) {
  displayToken(content);
}

if (finishReason === 'stop') {
  console.log('Generation complete');
}

Anthropic Streaming Format

Chunk Structure:
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
Key Differences:
  • Content in delta.text (not delta.content)
  • Multiple event types: message_start, content_block_delta, message_stop
  • No [DONE] marker
Example:
if (parsed.type === 'content_block_delta') {
  const content = parsed.delta?.text;
  if (content) {
    displayToken(content);
  }
}

if (parsed.type === 'message_stop') {
  console.log('Generation complete');
  const usage = parsed.usage;  // Final usage stats
}

Google Gemini Streaming Format

Chunk Structure:
data: {"candidates":[{"content":{"parts":[{"text":"Hello"}],"role":"model"}}],"usageMetadata":{"promptTokenCount":15,"candidatesTokenCount":5}}
Key Differences:
  • Content in candidates[0].content.parts[0].text
  • Usage in usageMetadata (appears in chunks, not just final)
Example:
const content = parsed.candidates?.[0]?.content?.parts?.[0]?.text;

if (content) {
  displayToken(content);
}

const usage = parsed.usageMetadata;
if (usage) {
  console.log('Cumulative usage:', usage);
}

React Integration Example

'use client';

import { useState } from 'react';

export function StreamingChat() {
  const [messages, setMessages] = useState<Array<{ role: string; content: string }>>([]);
  const [streaming, setStreaming] = useState(false);
  const [currentResponse, setCurrentResponse] = useState('');

  async function sendMessage(userMessage: string) {
    // Add user message
    const newMessages = [...messages, { role: 'user', content: userMessage }];
    setMessages(newMessages);
    setStreaming(true);
    setCurrentResponse('');

    try {
      const response = await fetch('https://api.lavapayments.com/v1/forward/openai/chat/completions', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${process.env.NEXT_PUBLIC_FORWARD_TOKEN}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          model: 'gpt-4',
          messages: newMessages,
          stream: true
        })
      });

      const reader = response.body?.getReader();
      const decoder = new TextDecoder();
      let buffer = '';
      let fullResponse = '';

      while (true) {
        const { done, value } = await reader!.read();
        if (done) break;

        buffer += decoder.decode(value, { stream: true });
        const lines = buffer.split('\n');
        buffer = lines.pop() || '';

        for (const line of lines) {
          if (line.startsWith('data: ')) {
            const data = line.slice(6);
            if (data === '[DONE]') continue;

            try {
              const parsed = JSON.parse(data);
              const content = parsed.choices[0]?.delta?.content || '';

              if (content) {
                fullResponse += content;
                setCurrentResponse(fullResponse);
              }
            } catch (err) {
              // Skip parse errors
            }
          }
        }
      }

      // Add assistant response
      setMessages([...newMessages, { role: 'assistant', content: fullResponse }]);
      setCurrentResponse('');
    } catch (error) {
      console.error('Streaming error:', error);
    } finally {
      setStreaming(false);
    }
  }

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i} className={msg.role}>
          {msg.content}
        </div>
      ))}
      {streaming && currentResponse && (
        <div className="assistant streaming">
          {currentResponse}
          <span className="cursor">|</span>
        </div>
      )}
      <input
        onKeyDown={(e) => {
          if (e.key === 'Enter' && !streaming) {
            sendMessage(e.currentTarget.value);
            e.currentTarget.value = '';
          }
        }}
        disabled={streaming}
        placeholder="Type a message..."
      />
    </div>
  );
}

Troubleshooting

Common causes:
  • Network timeout (connection dropped)
  • Browser tab backgrounded (some browsers throttle background tabs)
  • Provider rate limit hit mid-stream
  • Wallet balance insufficient (stream terminates when funds depleted)
Solutions:
  • Implement reconnection logic for network failures
  • Keep tab active during streaming
  • Check provider rate limits and add exponential backoff
  • Monitor wallet balance before streaming requests
  • Add error handlers for stream interruptions
Reasons:
  • Incomplete chunks in buffer (line split mid-JSON)
  • Provider-specific format differences (OpenAI vs Anthropic)
  • Malformed JSON from provider (rare)
Solutions:
  • Always buffer incomplete lines (see fetch API example)
  • Check provider-specific delta structure
  • Wrap JSON.parse in try/catch to skip bad chunks
  • Log unparseable data for debugging
Check:
  • Headers only available AFTER stream completes (not during)
  • Accessing response.headers before reader.read() finishes
  • Header names are case-sensitive: x-lava-request-id (lowercase)
Solution:
// Wait for stream to complete
while (true) {
  const { done } = await reader.read();
  if (done) break;
  // ... process chunks
}

// NOW headers are available
const requestId = response.headers.get('x-lava-request-id');
Possible issues:
  • Reverse proxy buffering responses (Nginx, Cloudflare)
  • Edge functions timeout before stream completes
  • CORS headers blocking stream in browser
  • Compression middleware breaking SSE format
Solutions:
  • Disable response buffering: Nginx proxy_buffering off;
  • Use longer function timeouts for streaming routes
  • Ensure CORS allows streaming: Access-Control-Allow-Origin
  • Disable compression for SSE routes

What’s Next