Inference

Groq Hits 800 Tokens Per Second

All articles
🏎️🔥

Inference Just Crossed The Speed Of Thought

The first time you watch Groq stream a 500-token response in under a second, your brain refuses to believe it. 800 tokens per second on Llama 3 70B. The internet collectively lost its mind, then quietly started rewriting their agents to use it.

The Setup

Groq isn't a GPU company. The LPU — Language Processing Unit — is a custom chip designed for one thing: sequential token generation with deterministic latency. No memory bandwidth bottleneck, no batching gymnastics, just brutally fast inference. The benchmarks are not a typo.

pip install groq

# Set your API key
export GROQ_API_KEY="gsk_..."

The Money Pattern

The API is OpenAI-shaped, so existing clients port over in five minutes. Pricing on small models actually undercuts OpenAI's equivalents. For voice agents, autocomplete, and any UX where latency is the product, Groq is now the default choice.

from groq import Groq

client = Groq()

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[
        {"role": "system", "content": "You're a concise assistant."},
        {"role": "user", "content": "Plan a Gold Coast surf trip for next weekend."}
    ],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

The Catch

The model selection is limited. Llama 3 variants, a handful of Mistrals, some Whisper for audio — that's basically it. No fine-tuning hosting. No image models. No long-context monsters. If you need GPT-5-Opus-Pro-Max-Turbo, Groq isn't it.

Rate limits also get aggressive at the free tier. The good news is paid tiers are reasonable. The bad news is you'll find out the hard way.

The Verdict

For anything where streaming latency is the actual product — voice apps, agent step-throughs, live editing UIs — Groq is the obvious move. For everything else, it's a great backup provider when you need speed over flexibility. The LPU bet aged better than anyone expected.

Let us make some quick suggestions?
Please provide your full name.
Please provide your phone number.
Please provide a valid phone number.
Please provide your email address.
Please provide a valid email address.
Please provide your brand name or website.
Please provide your brand name or website.