Together AI Is Cheaper Than OpenAI And Faster

If you've been paying OpenAI prices for an LLM that mostly does string formatting, behold: Together AI hosts the same open-weight models for cents on the dollar and ships them at 200+ tokens/sec. Do not @ me about lock-in — this is OpenAI-compatible.

The Setup

Together is a managed inference cloud running Llama 3, Mixtral, Qwen, DeepSeek, and every other open model worth caring about. $0.20 per million input tokens for a 70B, no GPU to rent, no Kubernetes to babysit. Plot twist: the curl call is exactly what you think it is.

curl -X POST https://api.together.xyz/v1/chat/completions \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3-70b-chat-hf",
    "messages": [{"role": "user", "content": "explain RLS in one paragraph"}],
    "max_tokens": 300
  }'

The Money Pattern

It's a drop-in for the OpenAI SDK. Change the base URL and you're billed by Together instead of Sam Altman. I swapped a Rebuild Relief internal classifier over in 90 seconds and the bill dropped 80%.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.together.xyz/v1",
    api_key=os.environ["TOGETHER_API_KEY"],
)

resp = client.chat.completions.create(
    model="meta-llama/Llama-3-70b-chat-hf",
    messages=[{"role": "user", "content": "classify this hail damage report"}],
    temperature=0.2,
)
print(resp.choices[0].message.content)

The Catch

Rate limits on the free tier are real and you will hit them in dev within an afternoon. Model selection lags new HuggingFace releases by a few days — don't expect day-zero support for whatever model just trended on X. And the streaming endpoint occasionally hiccups under load, so wrap your client in retries.

The Verdict

For bursty workloads, side projects, and anything that doesn't justify a dedicated GPU, Together is genuinely the best deal on the open web. I default to it for prototyping anything that doesn't ship with Claude in the loop. Move your non-critical inference today.

Inference

$0.20 per million tokens for Llama 3 70B and it's faster than your self-hosted box

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?