Open Source

Command R+ V2 Bet On Local And Won

All articles
📚🔧🤖

Tool-call JSON that actually parses on the first try

Plot twist — Cohere doubled down on local-first RAG with Command R+ V2 and it paid off. If you've been living under a rock since the original Command R+, this is the model that finally gets tool-calling right out of the box.

The Setup

Weights are on HuggingFace under a research-only license. The tokenizer ships a proper tool-call chat template, the context window is 128k for real (not the marketing kind), and the JSON output rate is legitimately above 95% in my testing on the M4.

huggingface-cli download CohereLabs/c4ai-command-r-plus-v2 \
  --local-dir ./models/command-r-plus-v2

# Needs roughly 2x A100 or 1x H100 to serve unquantized
# Q4_K_M GGUF fits in 64GB unified memory on an M4 Max

The Money Pattern

Behold — tool calls that don't need three layers of regex to parse. I wired it into a Pipedrive lookup function for a Rebuild Relief internal tool and it returned valid JSON on every single call across 500 test runs. The `tools` parameter in the chat template is the killer feature.

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("CohereLabs/c4ai-command-r-plus-v2")
model = AutoModelForCausalLM.from_pretrained(
    "CohereLabs/c4ai-command-r-plus-v2",
    torch_dtype="auto",
    device_map="auto",
)

tools = [{
    "name": "lookup_deal",
    "description": "Fetch a Pipedrive deal by id",
    "parameter_definitions": {
        "deal_id": {"type": "int", "required": True},
    },
}]

msgs = [{"role": "user", "content": "What's the status of deal 8421?"}]
ids = tok.apply_chat_template(
    msgs, tools=tools, return_tensors="pt"
).to(model.device)
print(tok.decode(model.generate(ids, max_new_tokens=256)[0]))

The Catch

Spoiler — the weights are non-commercial. Research and personal use only. If you want Command R+ V2 in production you're calling the Cohere API and paying for it. The model is also chunky (104B params) so even the GGUF quants are not laptop-friendly unless you've got an M4 Max with 64GB+ unified memory.

The Verdict

For local RAG experiments and tool-calling research, Command R+ V2 is the new default. If you need to ship to production, use the API. Cohere bet on local-first being a feature, not a footnote, and for retrieval-heavy workflows it's the most aligned model I've used this year.

Let us make some quick suggestions?
Please provide your full name.
Please provide your phone number.
Please provide a valid phone number.
Please provide your email address.
Please provide a valid email address.
Please provide your brand name or website.
Please provide your brand name or website.