Schema in, valid JSON out. Every time. No retries.
If you've ever shipped a "parse the LLM output" try/except with three fallbacks, Outlines is here to make you feel slightly silly.
The Setup
Outlines does constrained decoding — it manipulates token logits during generation so the model literally cannot emit a token that would break your schema. Pydantic in, valid JSON out, no retries, no regex hacks.
pip install outlines transformers torch
# Or wire it to vLLM / llama.cpp for prodThe Money Pattern
Give it a Pydantic model. Get back an instance. The constraint runs at the logit level, so even small open models suddenly produce machine-parseable output instead of the usual "here is your JSON: ```json" preamble.
import outlines
from pydantic import BaseModel
from typing import Literal
class ClaimVerdict(BaseModel):
severity: Literal["minor", "moderate", "severe"]
payout_aud: float
needs_inspection: bool
model = outlines.models.transformers("meta-llama/Llama-3.2-3B-Instruct")
gen = outlines.generate.json(model, ClaimVerdict)
result = gen("Claim 4821: cracked skylights, dented north roof, Gold Coast.")
print(result.severity, result.payout_aud, result.needs_inspection)The Catch
You need logit-level access to the model, which means local or self-hosted — OpenAI and Anthropic APIs are out. There's also a speed cost: building the finite-state machine for complex schemas isn't free, and very large unions can get slow. Cache your generators.
The Verdict
If you're running local models in production and you're not using structured generation, you're paying a tax in retries and parsing errors. Outlines isn't optional anymore — it's table stakes for serious open-model deployments.