UAE-built, Apache-flavored, actually downloadable
If you've been living under a rock since Falcon 2 launched and underwhelmed, TII just shipped Falcon 3 in 7B, 40B and 180B sizes under a permissive license. The UAE is back in the open-weight game and this time the benches don't make you laugh.
The Setup
Falcon 3 is on HuggingFace, weights are downloadable from my M4 Mac without a clickwrap dance, and the tokenizer finally handles code without exploding. The 7B is the daily-driver size — fits in 8GB VRAM at Q4.
huggingface-cli download tiiuae/falcon-3-7b-instruct \
--local-dir ./models/falcon-3-7b
# or pull a GGUF for llama.cpp / ollama
ollama pull falcon3:7b
ollama run falcon3:7b "write a postgres rls policy for tenant_id"The Money Pattern
Behold — the chat template just works in transformers. No custom tokenizer surgery, no jinja gymnastics. I plugged it into a tiny Astro 5 + Netlify Functions endpoint for a Rebuild Relief internal tool and it shipped in an afternoon.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("tiiuae/falcon-3-7b-instruct")
model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-3-7b-instruct",
torch_dtype=torch.bfloat16,
device_map="auto",
)
msgs = [
{"role": "system", "content": "You are a SQL expert."},
{"role": "user", "content": "write me a CTE for daily active tenants"},
]
inputs = tok.apply_chat_template(msgs, return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=256)
print(tok.decode(out[0], skip_special_tokens=True))The Catch
Plot twist — it trails Llama 3.3 70B and Qwen 3 on the standard benches. Falcon 3 7B is competitive with Llama 3 8B, not the 70B class. The 180B is a beast but you need an 8xH100 box to serve it usefully, which kills the hobbyist story.
The Verdict
Falcon 3 is the most credible TII release in two years. The 7B is a fine daily driver, the 40B is genuinely useful for self-hosted RAG, and the license actually lets you ship. Won't dethrone Llama or Qwen on raw benches, but for sovereign-deploy or multilingual workloads it's a real option.