Tool-call JSON that actually parses on the first try
Plot twist — Cohere doubled down on local-first RAG with Command R+ V2 and it paid off. If you've been living under a rock since the original Command R+, this is the model that finally gets tool-calling right out of the box.
The Setup
Weights are on HuggingFace under a research-only license. The tokenizer ships a proper tool-call chat template, the context window is 128k for real (not the marketing kind), and the JSON output rate is legitimately above 95% in my testing on the M4.
huggingface-cli download CohereLabs/c4ai-command-r-plus-v2 \
--local-dir ./models/command-r-plus-v2
# Needs roughly 2x A100 or 1x H100 to serve unquantized
# Q4_K_M GGUF fits in 64GB unified memory on an M4 MaxThe Money Pattern
Behold — tool calls that don't need three layers of regex to parse. I wired it into a Pipedrive lookup function for a Rebuild Relief internal tool and it returned valid JSON on every single call across 500 test runs. The `tools` parameter in the chat template is the killer feature.
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("CohereLabs/c4ai-command-r-plus-v2")
model = AutoModelForCausalLM.from_pretrained(
"CohereLabs/c4ai-command-r-plus-v2",
torch_dtype="auto",
device_map="auto",
)
tools = [{
"name": "lookup_deal",
"description": "Fetch a Pipedrive deal by id",
"parameter_definitions": {
"deal_id": {"type": "int", "required": True},
},
}]
msgs = [{"role": "user", "content": "What's the status of deal 8421?"}]
ids = tok.apply_chat_template(
msgs, tools=tools, return_tensors="pt"
).to(model.device)
print(tok.decode(model.generate(ids, max_new_tokens=256)[0]))The Catch
Spoiler — the weights are non-commercial. Research and personal use only. If you want Command R+ V2 in production you're calling the Cohere API and paying for it. The model is also chunky (104B params) so even the GGUF quants are not laptop-friendly unless you've got an M4 Max with 64GB+ unified memory.
The Verdict
For local RAG experiments and tool-calling research, Command R+ V2 is the new default. If you need to ship to production, use the API. Cohere bet on local-first being a feature, not a footnote, and for retrieval-heavy workflows it's the most aligned model I've used this year.