Dev Tools

Fly.io Just Added GPU Volumes

All articles
🪂🎮💾

Cold-start an A100, swap weights, kill it

Spoiler: Fly.io shipped GPU-attached volumes for Fly Machines and the inference crowd lost it. You can now mount a persistent NVMe volume to an A100 machine, hot-swap weights, and pay per second. Do not @ me — this is genuinely good.

The Setup

Fly Machines are firecracker VMs that boot in milliseconds. Pair them with a GPU and a persistent volume and you've got the cheapest serverless inference rig I've seen in 2026. I ran it from my M4 Mac with the fly CLI and a tiny Astro 5 dashboard on top.

fly volumes create model_weights --region ord --size 100 --gpu

fly machine run \
  --vm-gpu-kind a100-40gb \
  --vm-memory 32gb \
  --volume model_weights:/weights \
  --env MODEL_PATH=/weights/llama-3-70b-q4 \
  ghcr.io/aidxn/llm-server:latest

The Money Pattern

Here's the flex — you can preload weights once, then boot a machine cold in seconds without re-downloading 40GB. Killed machines retain the volume. That makes per-tenant inference for multi-tenant SaaS economically reasonable for the first time.

// API route — Astro 5, served from Netlify
import { createMachine } from "@fly/machines";

export const POST = async ({ request }) => {
  const { tenantId, prompt } = await request.json();
  const m = await createMachine({
    image: "ghcr.io/aidxn/llm-server",
    gpu: "a100-40gb",
    volume: `weights_${tenantId}`,
    autoStop: true,
    idleTimeoutSec: 30,
  });
  return Response.json(await m.invoke({ prompt }));
};

The Catch

It's still expensive. A100 minutes add up fast and the scaling limits are real — you can't horizontally fan out an L40S like you can a CPU machine. Volume snapshots between regions are flaky. And if your traffic is steady, you'll save money on a dedicated runpod box instead.

The Verdict

For spiky, per-tenant inference workloads, Fly.io GPU volumes are the new default. Pair with Supabase auth, an Astro 5 dashboard, and per-second billing and you've got a real serverless GPU story. Just don't run it 24/7 unless you like surprise invoices.

Let us make some quick suggestions?
Please provide your full name.
Please provide your phone number.
Please provide a valid phone number.
Please provide your email address.
Please provide a valid email address.
Please provide your brand name or website.
Please provide your brand name or website.