Skip to main content

Overview

Together AI provides high-performance inference for open-source language models, offering competitive pricing and extensive model selection with a focus on speed and reliability. Key Features:
  • 100+ open-source models available
  • Fast inference with optimized infrastructure
  • Fully OpenAI-compatible API
  • Fine-tuning and custom model deployment
Official Documentation: Together AI Docs

Authentication

Together AI uses Bearer token authentication with the OpenAI-compatible format. Header:
Authorization: Bearer YOUR_TOGETHER_API_KEY
Lava Forward Token:
${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}
For BYOK (Bring Your Own Key):
${LAVA_SECRET_KEY}.${CONNECTION_SECRET}.${PRODUCT_SECRET}.${YOUR_TOGETHER_API_KEY}

ModelContextDescriptionUse Case
meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo128KMeta’s latest flagshipGeneral reasoning, coding
mistralai/Mixtral-8x7B-Instruct-v0.132KMistral’s MoE modelBalanced performance/cost
Qwen/Qwen2.5-72B-Instruct128KAlibaba’s multilingual modelMultilingual, math, coding
Pricing: See Together AI Pricing for current rates.

Quick Start Example

// 1. Set up your environment variables
const LAVA_FORWARD_TOKEN = process.env.LAVA_FORWARD_TOKEN;

// 2. Define the Together AI endpoint
const TOGETHER_ENDPOINT = 'https://api.together.xyz/v1/chat/completions';

// 3. Make the request through Lava
const response = await fetch(
  `https://api.lavapayments.com/v1/forward?u=${encodeURIComponent(TOGETHER_ENDPOINT)}`,
  {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${LAVA_FORWARD_TOKEN}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo',
      messages: [
        {
          role: 'user',
          content: 'Explain the benefits of open-source AI.'
        }
      ],
      temperature: 0.7,
      max_tokens: 512
    })
  }
);

// 4. Parse response and extract usage
const data = await response.json();
console.log('Response:', data.choices[0].message.content);

// 5. Track usage (from response body)
const usage = data.usage;
console.log('Tokens used:', usage.total_tokens);

// 6. Get Lava request ID (from headers)
const requestId = response.headers.get('x-lava-request-id');
console.log('Lava Request ID:', requestId);

Available Endpoints

Together AI supports OpenAI-compatible endpoints:
EndpointMethodDescription
/v1/chat/completionsPOSTText generation with conversation context
/v1/completionsPOSTDirect text completion (no chat format)
/v1/modelsGETList available models

Usage Tracking

Usage data is returned in the response body (OpenAI format):
{
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 180,
    "total_tokens": 205
  }
}
Location: data.usage Format: Standard OpenAI usage object Lava Tracking: Automatically tracked via x-lava-request-id header

Features & Capabilities

Streaming:
{
  "stream": true
}
JSON Mode:
{
  "response_format": { "type": "json_object" }
}
Stop Sequences:
{
  "stop": ["</response>", "\n\n"]
}

BYOK Support

Status: ✅ Supported (managed keys + BYOK) BYOK Implementation:
  • Append your Together AI API key to the forward token: ${TOKEN}.${YOUR_TOGETHER_KEY}
  • Lava tracks usage and billing while you maintain key control
  • No additional Lava API key costs (metering-only mode available)
Getting a Together AI API Key:
  1. Sign up at Together AI Console
  2. Navigate to API Keys section
  3. Create a new API key
  4. Use in Lava forward token (4th segment)

Best Practices

  1. Model Selection: Use Llama 3.3 for reasoning, Qwen2.5 for multilingual, Mixtral for cost efficiency
  2. Temperature: 0.7-0.9 for creative tasks, 0.1-0.3 for factual outputs
  3. Context Management: Leverage 128K context models for long documents
  4. Error Handling: Together AI returns standard OpenAI error formats
  5. Model Naming: Use full model paths (e.g., meta-llama/Meta-Llama-3.3-70B-Instruct-Turbo)

Model Categories

Chat Models: Instruction-tuned for conversation (Llama 3.3, Qwen2.5, Mixtral) Code Models: Specialized for programming (CodeLlama, WizardCoder) Vision Models: Multi-modal support (Llama-Vision, Qwen-VL) Embedding Models: Text embeddings (all-MiniLM, UAE-Large) Full Catalog: Together AI Models

Additional Resources