$0.20 per million tokens for Llama 3 70B and it's faster than your self-hosted box
If you've been paying OpenAI prices for an LLM that mostly does string formatting, behold: Together AI hosts the same open-weight models for cents on the dollar and ships them at 200+ tokens/sec. Do not @ me about lock-in — this is OpenAI-compatible.
The Setup
Together is a managed inference cloud running Llama 3, Mixtral, Qwen, DeepSeek, and every other open model worth caring about. $0.20 per million input tokens for a 70B, no GPU to rent, no Kubernetes to babysit. Plot twist: the curl call is exactly what you think it is.
curl -X POST https://api.together.xyz/v1/chat/completions \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3-70b-chat-hf",
"messages": [{"role": "user", "content": "explain RLS in one paragraph"}],
"max_tokens": 300
}'The Money Pattern
It's a drop-in for the OpenAI SDK. Change the base URL and you're billed by Together instead of Sam Altman. I swapped a Rebuild Relief internal classifier over in 90 seconds and the bill dropped 80%.
from openai import OpenAI
client = OpenAI(
base_url="https://api.together.xyz/v1",
api_key=os.environ["TOGETHER_API_KEY"],
)
resp = client.chat.completions.create(
model="meta-llama/Llama-3-70b-chat-hf",
messages=[{"role": "user", "content": "classify this hail damage report"}],
temperature=0.2,
)
print(resp.choices[0].message.content)The Catch
Rate limits on the free tier are real and you will hit them in dev within an afternoon. Model selection lags new HuggingFace releases by a few days — don't expect day-zero support for whatever model just trended on X. And the streaming endpoint occasionally hiccups under load, so wrap your client in retries.
The Verdict
For bursty workloads, side projects, and anything that doesn't justify a dedicated GPU, Together is genuinely the best deal on the open web. I default to it for prototyping anything that doesn't ship with Claude in the loop. Move your non-critical inference today.