Modal Just Killed Replicate

If you've been living under a rock: Modal cut its serverless GPU pricing again, cold starts now sit under two seconds, and Python developers can deploy an A100 by adding a decorator. Replicate is going to need a really good Q1.

The Setup

Modal is what you wished AWS Lambda was. You write a Python function, slap a decorator on it, run a deploy command, and a GPU function with autoscaling is in production. No Dockerfiles, no Kubernetes, no YAML at all. The container image layer is built and cached automatically.

pip install modal
modal token new

# Deploy with one command — that's it
modal deploy app.py

The Money Pattern

The decorator-driven deploy story is wildly productive. You declare GPU type, memory, timeout, secrets, image — all in Python. The same code runs locally for debugging and in the cloud for production. Idle scales to zero. You pay per second the GPU is actually running.

import modal

app = modal.App("llama-server")
image = modal.Image.debian_slim().pip_install("vllm", "fastapi")

@app.function(gpu="A100", image=image, scaledown_window=60)
@modal.web_endpoint(method="POST")
def generate(prompt: str):
    from vllm import LLM
    llm = LLM("meta-llama/Llama-3-8B-Instruct")
    return llm.generate(prompt)[0].outputs[0].text

The Catch

Vendor lock-in is real. Modal-specific decorators, Modal-specific image builder, Modal-specific volumes. Move off and you're rewriting a lot of glue code. Debugging cold starts is also annoying — the logs are good but you can't ssh in, and reproducing prod-only bugs locally is a vibes-based exercise.

The Verdict

For solo developers and small teams who want to ship GPU workloads without becoming part-time DevOps engineers, Modal is the move. Replicate still wins for the "model marketplace" use case where you want someone else to host the popular models. But for custom code? Modal is now the obvious answer. The ergonomics gap closed and pricing did the rest.

Inference

Serverless GPUs Just Got A Glow-Up

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?