Constrained decoding at the token level — your model cannot lie
Plot twist: while everyone's writing JSON validators that retry three times, Microsoft's Guidance library just forces the model to emit valid tokens. The model literally cannot break the schema.
The Setup
Guidance hooks into the logit stream of a local model and masks out tokens that would violate your constraint. Want JSON? Mask anything non-JSON. Want a regex? Mask anything that doesn't match. Output is guaranteed-valid by construction.
from guidance import models, gen, select
llm = models.LlamaCpp("./ghost-pepper-7b.Q4_K_M.gguf")
lm = llm + f"""
Customer: my roof is leaking after the storm
Severity (1-5): {gen("severity", regex=r"[1-5]")}
Department: {select(["roofing", "general", "emergency"], name="dept")}
"""
print(lm["severity"], lm["dept"])The Money Pattern
Combine \`gen\` with regex and \`select\` from a literal list, and you've replaced an entire validation layer. No retries, no Pydantic catch blocks, no "the model returned malformed JSON" Slack pings at 2am.
from guidance import gen, select, one_or_more
@guidance
def claim_form(lm, transcript: str):
lm += f"Transcript: {transcript}
"
lm += f"Postcode: {gen('postcode', regex=r'[0-9]{4}')}
"
lm += f"Damage: {select(['hail', 'wind', 'flood', 'fire'], name='damage')}
"
lm += f"Notes: {gen('notes', max_tokens=80, stop='\n')}"
return lm
result = llm + claim_form("caller at 4870, hail, garage shed flattened")The Catch
Guidance needs logit access — that means a local model or a provider that exposes logprobs. OpenAI and Anthropic don't give you the token-mask hook, so this is a llama.cpp / Transformers / vLLM play. On hosted APIs you're stuck with retry loops.
The Verdict
If you're running local models — which on an M4 Mac is finally pleasant — Guidance is the difference between a flaky JSON pipeline and a deterministic one. Pair it with Pydantic for typing on the outside and you've got a bulletproof extractor. Quietly essential.