Big model writes the textbook. Small model learns it.
Plot twist: the cheapest way to make a great small model is to spend an afternoon making a great dataset. Llama 4 is happy to write that dataset for you. Distilabel is happy to orchestrate the pipeline.
The Setup
Distilabel from Argilla is a declarative pipeline runner: you wire Steps together, point each one at an LLM, hit run. It handles batching, retries, deduping, and pushes the result straight to a HuggingFace dataset. Way cleaner than a folder of one-off scripts.
pip install distilabel[hf-inference-endpoints,openai]
# .env
HF_TOKEN=hf_xxx
OPENAI_API_KEY=sk-xxx # or any compatible endpointThe Money Pattern
The shape I keep reusing: seed prompts → generate with a big model → critique with a different big model → keep only the high-scoring pairs. That last step is the difference between training data and noise.
from distilabel.pipeline import Pipeline
from distilabel.steps import LoadDataFromHub, KeepColumns
from distilabel.steps.tasks import TextGeneration, UltraFeedback
from distilabel.llms import InferenceEndpointsLLM
with Pipeline(name="aidxn-synth") as pipe:
load = LoadDataFromHub(repo_id="argilla/seed-prompts", split="train")
gen = TextGeneration(
llm=InferenceEndpointsLLM(model_id="meta-llama/Llama-4-Scout-17B-16E"),
input_batch_size=8,
)
judge = UltraFeedback(
llm=InferenceEndpointsLLM(model_id="meta-llama/Llama-4-Maverick-17B-128E"),
aspects=["helpfulness", "honesty"],
)
keep = KeepColumns(columns=["instruction","generation","ratings"])
load >> gen >> judge >> keep
if __name__ == "__main__":
ds = pipe.run(use_cache=True)
ds.push_to_hub("aidxn/support-synth-v1", private=True)The Catch
Garbage in, garbage out, with a twist: garbage out of a big model looks confident. If your seed prompts are narrow, your synthetic data is narrow. If your judge model has a bias, your dataset bakes it in. Run actual evals on the resulting fine-tune before you trust the numbers — auto-grading your own generations is a trap.
The Verdict
Synthetic data isn't a hack anymore, it's a default. Llama 4 plus Distilabel plus a tight judging step gets you a usable instruction set in an afternoon. Spend the time on the seed prompts and the evals — that's where the moat is. Then fine-tune your small model and watch it punch up.