Local inference as a stdlib import. Wild times.
If you've been living under a rock, Bun has been quietly eating Node's lunch on DX for a year. Plot twist: Bun 1.2 just shipped a stdlib LLM module that loads ggml weights directly. No FFI, no install, just an import.
The Setup
The new built-in lives at `bun:llm` and wraps llama.cpp under the hood. Drop a GGUF file on disk, point at it, and you're streaming tokens in twelve lines.
import { llm } from "bun:llm";
const model = await llm.load("./models/qwen2.5-7b-instruct-q4_k_m.gguf", {
contextSize: 8192,
gpuLayers: 35, // M4 Metal offload
});
const stream = model.stream({
prompt: "write me a zod schema for an invoice",
maxTokens: 512,
});
for await (const token of stream) {
process.stdout.write(token);
}The Money Pattern
The killer use case is local-first scripts. I've got a CSV cleanup pipeline that runs against Pipedrive exports — now it lives as a single `bun run clean.ts` with the model loaded inline. No Ollama daemon, no HTTP hop, just direct memory access on the M4.
// bun run enrich-leads.ts
import { llm } from "bun:llm";
import { read } from "bun";
const model = await llm.load("./models/llama-3.2-3b.gguf");
const csv = await read("./leads.csv").text();
const cleaned = await model.complete({
prompt: `normalise these company names to title case:\n${csv}`,
maxTokens: 2048,
});
await Bun.write("./leads-clean.csv", cleaned);The Catch
It's flagged experimental, the API is going to churn, and it only takes ggml-compatible weights — no safetensors, no MLX. Production deploys should still go through a real inference server. This is a scripting hammer, not a serving layer.
The Verdict
For local dev scripts and one-shot data jobs this is a game changer. Bun keeps shipping features that should have been in Node a decade ago. If you're not at least kicking the tyres on Bun for side projects in 2026, you're working harder than you need to.