Top of MTEB, 100+ languages, MIT licensed
Spoiler: the best general-purpose embedding model in the world right now is open weights, multilingual, and free. BGE-M3 from BAAI took the top of the MTEB leaderboard and quietly made `text-embedding-ada-002` look like a museum piece.
The Setup
One pip install, one model load, and you're embedding 1024-dim vectors locally on an M4 Mac. The "M3" stands for multi-linguality, multi-functionality, multi-granularity — it does dense, sparse, and ColBERT-style retrieval out of the same forward pass.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-m3")
docs = [
"hail damage on a colorbond roof in Brisbane",
"dégâts de grêle sur une toiture",
"屋根のひょう害",
]
embeddings = model.encode(docs, normalize_embeddings=True)
print(embeddings.shape) # (3, 1024)The Money Pattern
Multilingual is the real flex. Same model, same vector space, cross-language retrieval that actually works. Query in English, hit documents in French, Japanese, Mandarin — no per-language pipelines, no translation step.
import numpy as np
query = model.encode("roof leak after hailstorm", normalize_embeddings=True)
corpus = model.encode([
"claim 4421: hail damage to tiled roof",
"fuite de toit après tempête de grêle",
"暴風雨後の屋根からの水漏れ",
"completely unrelated invoice text",
], normalize_embeddings=True)
# cosine similarity = dot product when normalized
scores = corpus @ query
for s, doc in sorted(zip(scores, corpus), reverse=True)[:3]:
print(f"{s:.3f}")The Catch
1024 dimensions is bigger than ada-002's 1536… wait, no, it's smaller. But it's still chunkier than the new OpenAI `text-embedding-3-small` at 512 dims. Storage and ANN index size go up accordingly. If you're cramming 100M vectors into pgvector, that math matters.
The Verdict
If you're paying OpenAI for embeddings in 2026 and you're not on the absolute frontier, you're lighting money on fire. BGE-M3 runs locally, beats the closed-source incumbents on most retrieval benchmarks, and costs nothing per token. Swap it in tonight — your CFO will thank you.