Aphrodite Engine Forked vLLM and Won

If you've been living under a rock, Aphrodite Engine is the spicy fork of vLLM that the local AI crowd quietly switched to about six months ago. Spoiler: it's faster on consumer GPUs and supports every quant format under the sun.

The Setup

vLLM is the gold standard for serving LLMs at scale — but it's tuned for A100s and corporate workloads. Aphrodite picked up the codebase, ripped out the limitations, and bolted on AWQ, GPTQ, EXL2, and aggressive KV cache quantization. The same model on the same 4090 just goes faster.

docker run --gpus all -p 2242:2242 \
  -v ~/models:/models \
  alpindale/aphrodite-engine:latest \
  --model /models/llama-3-70b-awq \
  --quantization awq \
  --kv-cache-dtype fp8 \
  --max-model-len 8192

The Money Pattern

It speaks OpenAI's API out of the box. Point any client at it and you have a drop-in replacement for the OpenAI SDK that runs on your own metal.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:2242/v1",
    api_key="sk-aphrodite-doesnt-care",
)

resp = client.chat.completions.create(
    model="llama-3-70b-awq",
    messages=[{"role": "user", "content": "draft a pipedrive webhook handler"}],
    max_tokens=512,
)
print(resp.choices[0].message.content)

The Catch

No enterprise adoption means no enterprise support. Documentation is Discord-flavored. The release cadence is fast enough that pinning a Docker tag is mandatory if you don't want surprises on Monday morning. And the project's RP-adjacent reputation means your CTO will side-eye the name on a slide deck.

The Verdict

For solo devs, home labs, and anyone running a Llama 3 70B on a single 4090, Aphrodite is genuinely the better engine. I'm running it on a Gold Coast workstation for an Aidxn Design side project and it's eating Q4 quants for breakfast. If you've never tried it, swap your `docker run` tonight.

Inference

A vLLM fork with better quants, KV cache wizardry, and zero corporate baggage

The Setup

The Money Pattern

The Catch

The Verdict

Let us make some quick suggestions?