LoRA vs QLoRA vs DoRA: The Real Differences

If you've been living under a rock, half the fine-tuning posts on X use these three acronyms interchangeably. They are not the same. Do not @ me — this matters when the GPU bill hits.

The Setup

LoRA freezes the base weights and trains two tiny matrices whose product is added to the original. Cheap to train, tiny adapters, almost-full-FT quality. QLoRA = LoRA but the base model is quantised to 4-bit so it fits on consumer cards. DoRA decomposes the weight into magnitude and direction and only LoRA-s the direction. Better quality, slower.

from peft import LoraConfig, get_peft_model

# Plain LoRA
lora = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
    lora_dropout=0.05, bias="none", task_type="CAUSAL_LM",
)
model = get_peft_model(base_model, lora)

The Money Pattern

DoRA is a one-flag upgrade. Same LoraConfig, flip use_dora=True. On small datasets — think 2-5k examples for a customer support adapter — DoRA consistently beats vanilla LoRA at the same rank. I've measured it on the Rebuild Relief ticket corpus and it's not subtle.

from peft import LoraConfig

# DoRA — one flag, better quality at the same rank
dora = LoraConfig(
    r=16, lora_alpha=32,
    target_modules=["q_proj","k_proj","v_proj","o_proj"],
    use_dora=True,         # the magic
    lora_dropout=0.05, bias="none", task_type="CAUSAL_LM",
)

# QLoRA — load base in 4-bit, then attach LoRA on top
from transformers import BitsAndBytesConfig
bnb = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="bfloat16",
)

The Catch

DoRA is slower to train — roughly 1.3-1.5x the step time of plain LoRA because of the extra magnitude path. For tiny datasets that's fine. For 100k+ examples the wall time adds up and you might prefer LoRA at higher rank. QLoRA gives you the smallest VRAM, but the 4-bit base costs you a bit of ceiling on quality.

The Verdict

Default to DoRA at r=16 for any fine-tune under 50k examples. Reach for QLoRA when the model doesn't fit. Stick with plain LoRA when you need every minute of training speed back. Three letters, three jobs, pick on purpose.

AI/LLM

Three letters apart, three completely different VRAM bills

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?