14B params, MIT license, trained on data that doesn't exist
Microsoft has been quietly cooking the Phi series for two years and Phi-4 is where it finally clicks. 14B parameters, MIT licensed, and trained almost entirely on synthetic data. The result is a model that's freakishly smart at reasoning and freakishly weird everywhere else.
The Setup
Phi-4 lives in the "absurdly capable for its size" tier. MMLU around 84, MATH around 80, GSM8K basically saturated. It runs comfortably on a 16GB M4 Mac. This is the model you put behind an internal agent and forget about.
# the boring way
huggingface-cli download microsoft/phi-4 --local-dir ./phi-4
# the lazy way
ollama pull phi4:14b
ollama run phi4:14bThe Money Pattern
Wire it into an agent loop with structured outputs. It's small enough to run multiple parallel instances, smart enough to actually solve the task.
from transformers import AutoTokenizer, AutoModelForCausalLM
import json
tok = AutoTokenizer.from_pretrained("microsoft/phi-4")
model = AutoModelForCausalLM.from_pretrained(
"microsoft/phi-4", device_map="auto", torch_dtype="auto",
)
def agent_step(state):
msgs = [
{"role": "system", "content": "Return JSON: {action, args, reasoning}"},
{"role": "user", "content": json.dumps(state)},
]
ids = tok.apply_chat_template(msgs, return_tensors="pt").to(model.device)
out = model.generate(ids, max_new_tokens=256, temperature=0.2)
return json.loads(tok.decode(out[0], skip_special_tokens=True))The Catch
The synthetic-data training shows up in odd places. Ask it about a recent news event and you get confident, beautifully-structured fiction. Ask about niche libraries and it invents API surfaces. The reasoning is sharp; the world knowledge is patchy and overconfident.
The Verdict
For pure reasoning, code review, math, and structured agent workflows, Phi-4 is the small-model king. For anything requiring real-world facts, pair it with RAG or just use a bigger model. Run it on a Mac mini at the edge — it's the most cost-efficient brain per gigabyte we've ever had.