Big model writes the textbook. Small model learns it.
Plot twist: the cheapest way to make a great small model is to spend an afternoon making a great dataset. Llama 4 is happy to write that dataset for you. Distilabel is happy to orchestrate the pipeline.
The Setup
Distilabel from Argilla is a declarative pipeline runner: you wire Steps together, point each one at an LLM, hit run. It handles batching, retries, deduping, and pushes the result straight to a HuggingFace dataset. Way cleaner than a folder of one-off scripts.
{`pip install distilabel[hf-inference-endpoints,openai]
# .env
HF_TOKEN=hf_xxx
OPENAI_API_KEY=sk-xxx # or any compatible endpoint`}The Money Pattern
The shape I keep reusing: seed prompts → generate with a big model → critique with a different big model → keep only the high-scoring pairs. That last step is the difference between training data and noise.
{`from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromHub, KeepColumns
from distilabel.steps.tasks import TextGeneration, UltraFeedback
from distilabel.llms import InferenceEndpointsLLM
with Pipeline(name="aidxn-synth") as pipe:
load = LoadDataFromHub(repo_id="argilla/seed-prompts", split="train")
gen = TextGeneration(
llm=InferenceEndpointsLLM(model_id="meta-llama/Llama-4-Scout-17B-16E"),
input_batch_size=8,
)
judge = UltraFeedback(
llm=InferenceEndpointsLLM(model_id="meta-llama/Llama-4-Maverick-17B-128E"),
aspects=["helpfulness", "honesty"],
)
keep = KeepColumns(columns=["instruction","generation","ratings"])
load >> gen >> judge >> keep
if __name__ == "__main__":
ds = pipe.run(use_cache=True)
ds.push_to_hub("aidxn/support-synth-v1", private=True)`}The Catch
Garbage in, garbage out, with a twist: garbage out of a big model looks confident. If your seed prompts are narrow, your synthetic data is narrow. If your judge model has a bias, your dataset bakes it in. Run actual evals on the resulting fine-tune before you trust the numbers — auto-grading your own generations is a trap.
The Verdict
Synthetic data isn't a hack anymore, it's a default. Llama 4 plus Distilabel plus a tight judging step gets you a usable instruction set in an afternoon. Spend the time on the seed prompts and the evals — that's where the moat is. Then fine-tune your small model and watch it punch up.