Real text. 4K plates. No upscaler tax.
Behold: Stability finally shipped SD4, and the demos are not lying. Text inside images is legible. Hands have five fingers. The 1024 → 4K upscale pipeline is gone because the model just renders at 4K natively.
The Setup
SD4 is a hybrid DiT with a beefier text encoder — Stability swapped in a custom T5-XXL variant for typography control. The base model handles up to 4096x4096 in a single pass. It also ships with a permissive non-commercial-for-research-but-fine-for-most-people license that the community is currently arguing about on X.
pip install -U diffusers
huggingface-cli download stabilityai/stable-diffusion-4-base
huggingface-cli download stabilityai/stable-diffusion-4-refiner # optionalThe Money Pattern
For poster work and motion plates that get composited in After Effects, native 4K means I skip the ESRGAN step entirely. Generate, drop into Affinity Designer, mask, ship.
import torch
from diffusers import StableDiffusion4Pipeline
pipe = StableDiffusion4Pipeline.from_pretrained(
"stabilityai/stable-diffusion-4-base",
torch_dtype=torch.bfloat16,
).to("cuda")
img = pipe(
prompt='retro magazine cover, bold headline reading "VELOCITY 8.5", grainy print',
negative_prompt="blurry, low contrast",
height=3072, width=2048,
num_inference_steps=40,
guidance_scale=5.0,
).images[0]
img.save("cover_4k.png")The Catch
The LoRA ecosystem is about to fragment. SD1.5 LoRAs, SDXL LoRAs, SD3 LoRAs, and now SD4 LoRAs — none of them interchangeable. Expect six months of CivitAI confusion while everyone re-trains. Also 4K means VRAM. 32GB or you're tiling.
The Verdict
This is the first SD release since 1.5 that feels genuinely worth migrating to. The text rendering alone kills my "generate then Photoshop in the headline" workflow. If you do any kind of editorial or poster design, SD4 is your new base model. Pull tonight.