Smaug 2 Climbed The Leaderboard Overnight

Plot twist: a 72B model nobody was watching jumped to the top of the HuggingFace Open LLM leaderboard overnight. Abacus AI dropped Smaug 2 with a custom DPO recipe and the benchmark gods smiled.

The Setup

Smaug 2 72B is built on Qwen 72B base, then aggressively fine-tuned with DPOP — a tweaked Direct Preference Optimization that adds a positive log-likelihood term. The result: massive MMLU gains in days, not weeks.

{`huggingface-cli download abacusai/Smaug-2-72B --local-dir ./smaug-2

# or yank the GGUF for llama.cpp
huggingface-cli download \
  TheBloke/Smaug-2-72B-GGUF \
  smaug-2-72b.Q4_K_M.gguf --local-dir ./models`}

The Money Pattern

The DPOP trick is the actual contribution. Vanilla DPO sometimes pushes preferred-response log-probs down. DPOP adds a clamp. Steal this for your own fine-tunes — it works on any base.

{`# DPOP loss sketch — pair with trl's DPOTrainer
import torch

def dpop_loss(policy_chosen_logps, policy_rejected_logps,
              ref_chosen_logps, ref_rejected_logps, beta=0.1, lam=50.0):
    pi_ratio = policy_chosen_logps - policy_rejected_logps
    ref_ratio = ref_chosen_logps - ref_rejected_logps
    logits = beta * (pi_ratio - ref_ratio)
    # the DPOP penalty — keep chosen logp above reference
    penalty = lam * torch.clamp(ref_chosen_logps - policy_chosen_logps, min=0)
    return -torch.nn.functional.logsigmoid(logits).mean() + penalty.mean()`}

The Catch

Real talk: leaderboard gains and chat vibes don't always match. Anecdotal reports say Smaug 2 is slightly stiffer than Qwen base on creative writing. There's a smell of benchmark optimisation. Run your own eval before you bet a product on it.

The Verdict

Smaug 2 won the week, but the actual gift is the DPOP recipe. If you're fine-tuning anything at all, swap your DPO loss for this and see what happens. The model itself is a strong Qwen variant — just don't read too much into the #1 spot.

Open Source

Abacus AI's DPO trick shot Smaug 2 72B to #1 — for now

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?