AI/LLM

Unsloth Is 2x Faster Than Last Year

All articles
🦥💨🧪

Hand-written Triton kernels and 70% less VRAM, for free

Spoiler: the Unsloth team rewrote the autograd path in Triton again and your fine-tunes just got cheaper. Same dataset, same LoRA, half the wall time, a third of the VRAM. Free.

The Setup

Unsloth is a drop-in replacement for HuggingFace's training loop. You import their model loader instead of AutoModelForCausalLM, attach LoRA, hand it to TRL's SFTTrainer, and the rest of your code is identical. The kernels do the heavy lifting.

{`pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes`}

The Money Pattern

Here's the whole script. A 7B QLoRA fine-tune on an A100 used to take six hours; this finishes in two. On a 4090 at home it's the difference between "leave it overnight" and "have a coffee".

{`from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments

model, tok = FastLanguageModel.from_pretrained(
    model_name="unsloth/llama-3-8b-bnb-4bit",
    max_seq_length=4096,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=32,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
)

trainer = SFTTrainer(
    model=model, tokenizer=tok,
    train_dataset=ds, dataset_text_field="text",
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=10, max_steps=200,
        learning_rate=2e-4, bf16=True,
        output_dir="out",
    ),
)
trainer.train()`}

The Catch

NVIDIA-only. The Triton kernels assume CUDA, so AMD ROCm and Apple silicon are out. If you're on an M-series Mac for training, you're still stuck with MLX (which has its own LoRA story — different post). For cloud GPUs, Unsloth is a no-brainer.

The Verdict

If you're fine-tuning anything on NVIDIA and not using Unsloth, you're lighting cloud credits on fire. Swap your loader, keep your dataset, watch the wall clock collapse. Then ship the adapter and pretend it was hard.

Let us make some quick suggestions?
Please provide your full name.
Please provide your phone number.
Please provide a valid phone number.
Please provide your email address.
Please provide a valid email address.
Please provide your brand name or website.
Please provide your brand name or website.