Vector Embeddings: How AI Understands Meaning
By Dorian Laurenceau
📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
When you search "running shoes" and find results for "jogging sneakers," that's not keyword matching-it's semantic understanding powered by vector embeddings. Here's why this technology matters.
<!-- manual-insight -->
What Reddit stopped believing about embeddings
The embeddings space has quietly become one of the most over-engineered corners of AI infrastructure. If you read r/MachineLearning or r/LocalLLaMA in 2023, every other post was about picking the "best" embedding model from the MTEB leaderboard. Three years later, the tone has shifted: the benchmark is gamed, the top-of-leaderboard model rarely wins on your data, and a 1536-dimensional OpenAI embedding isn't measurably better than a 384-dim all-MiniLM-L6-v2 for most retrieval tasks.
The MTEB paper is worth skimming to understand what the benchmark actually measures — and what it doesn't (your specific domain vocabulary, your query distribution, your chunk sizes). The MTEB leaderboard on Hugging Face is still a useful starting point, but treat it as a shortlist generator, not a verdict.
What matters in practice:
- →Match the model to your query length. If your users type 3-word queries, a model trained on long-document retrieval will underperform a model trained on short-to-long asymmetric retrieval.
- →Normalize your embeddings. Cosine similarity on non-normalized vectors is a common bug that quietly halves retrieval quality. Most libraries do this; check yours.
- →Reranking beats bigger embeddings. A small embedding + a cross-encoder reranker (Cohere Rerank, bge-reranker) routinely outperforms switching to a larger embedding model, at a fraction of the compute.
If you're evaluating, run a 100-query labeled eval on your own data before picking a model. It takes an afternoon and saves months of debugging.
Learn AI — From Prompts to Agents
What Are Vector Embeddings?
A vector embedding is a list of numbers that represents the meaning of text (or images, audio, etc.) in a way computers can process.
Text to Numbers
"I love pizza" → [0.23, -0.45, 0.87, 0.12, ..., -0.33]
(typically 384-1536 numbers)
These numbers capture semantic meaning, not just characters.
Similar Meanings = Similar Numbers
"I love pizza" → [0.23, -0.45, 0.87, ...]
"Pizza is great" → [0.25, -0.42, 0.85, ...] ← Very similar!
"I hate broccoli" → [-0.18, 0.32, -0.22, ...] ← Very different
Why Embeddings Matter
Traditional Search (Keyword Matching)
Search: "automobile"
Documents with "automobile" ✓
Documents with "car" ✗ (different word!)
Semantic Search (Embeddings)
Search: "automobile"
"automobile" ✓ (same word)
"car" ✓ (similar meaning)
"vehicle" ✓ (related concept)
"Tesla Model 3" ✓ (it's a car!)
Embeddings enable search by meaning, not just keywords.
How Embeddings Work (Simplified)
The Training Process
Embedding models learn from billions of text examples:
1. "The cat sat on the mat"
2. "Dogs are loyal pets"
3. "Machine learning uses algorithms"
... billions more
The model learns:
- "cat" and "dog" are somewhat related (both pets)
- "mat" and "rug" are very related
- "cat" and "algorithm" are unrelated
The Result: A Semantic Map
Imagine a vast space where every concept has a position:
Animals Cluster:
- →cat → kitten, feline
- →dog → puppy
Furniture Cluster:
- →mat → rug → carpet
Words with similar meanings cluster together.
Dimensions: What the Numbers Mean
Each number in an embedding captures some aspect of meaning:
Dimension 1: Maybe "living thing" vs "object"
Dimension 42: Maybe "positive" vs "negative" sentiment
Dimension 256: Maybe "formal" vs "casual" language
...
No single dimension has a clear meaning-it's the combination that matters.
Why So Many Dimensions?
256 dimensions: Basic understanding
768 dimensions: Good for most tasks
1536 dimensions: Rich semantic capture
More dimensions = more nuanced understanding, but higher storage/compute cost.
Similarity: Measuring Closeness
Cosine Similarity
The standard way to compare embeddings:
Similarity("car", "automobile") = 0.94 (very similar)
Similarity("car", "banana") = 0.12 (unrelated)
Similarity("car", "vehicle") = 0.87 (related)
Scale: -1 (opposite) to 1 (identical meaning)
Why Cosine Works
It measures the angle between vectors, ignoring magnitude:
"I really really love cars" and "I love cars"
→ Same direction, different length
→ Cosine sees them as similar
Embeddings in Action: RAG Systems
RAG (Retrieval-Augmented Generation) uses embeddings at its core:
Step 1: Embed Your Documents
Document 1: "Our return policy allows 30-day returns..."
→ [0.12, -0.34, 0.56, ...]
Document 2: "Shipping takes 3-5 business days..."
→ [-0.23, 0.45, 0.11, ...]
... store all embeddings
Step 2: Embed the User Question
User: "How long do I have to return an item?"
→ [0.14, -0.31, 0.52, ...] ← Similar to Document 1!
Step 3: Find Most Similar
Compare question embedding to all document embeddings:
- Document 1: 0.94 similarity ← Winner!
- Document 2: 0.23 similarity
- Document 3: 0.18 similarity
Return Document 1 to the LLM for answering.
Popular Embedding Models (2025)
| Model | Dimensions | Best For |
|---|---|---|
| OpenAI text-embedding-3-small | 1536 | General purpose, affordable |
| OpenAI text-embedding-3-large | 3072 | Highest quality |
| Cohere embed-v3 | 1024 | Multilingual |
| Google text-embedding-004 | 768 | Google ecosystem |
| Open source (BGE, E5) | 384-1024 | Self-hosted, free |
Limitations of Embeddings
1. Fixed at Creation Time
Embedding from 2023 doesn't know about events in 2024.
Need to re-embed with newer models for updates.
2. Context Window Limits
Most embedding models handle max 512-8000 tokens.
Long documents need chunking.
3. Same Words, Different Meanings
"Bank" (financial) vs "bank" (river)
Embeddings try to capture context, but it's imperfect.
4. Language/Cultural Bias
Models trained mainly on English perform worse on other languages.
Cultural concepts may not embed well.
Quick Summary
- →Embeddings convert text to numbers representing meaning
- →Similar meanings → similar number patterns
- →Enable semantic search beyond keyword matching
- →Foundation of RAG systems and AI search
- →Trade-offs: dimensions, speed, quality, cost
Ready to Build with Embeddings?
This article covered the what and why of vector embeddings. But building production RAG systems requires understanding chunking, retrieval strategies, and integration patterns.
In our Module 5, RAG & Context Engineering, you'll learn:
- →Choosing the right embedding model
- →Document chunking strategies
- →Hybrid search (embeddings + keywords)
- →Vector database selection
- →Production RAG architecture
Module 5 — RAG (Retrieval-Augmented Generation)
Ground AI responses in your own documents and data sources.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
What are vector embeddings?+
Vector embeddings convert text into numerical arrays that capture meaning. Similar concepts have similar vectors. This lets AI understand that 'running shoes' and 'jogging sneakers' mean similar things.
How do embeddings enable semantic search?+
Embeddings map text to a mathematical space where distance equals similarity. Searching finds vectors close to your query, returning semantically similar results even without matching keywords.
What embedding models should I use?+
Popular choices: OpenAI text-embedding-3, Cohere embed-v3, open-source models like BGE or E5. Choice depends on accuracy needs, cost, and whether you need multilingual support.
How do embeddings work with RAG?+
In RAG, documents are split into chunks and embedded. When you ask a question, your query is embedded and compared to document vectors. The closest chunks are retrieved and sent to the LLM.