Back to all articles
6 MIN READ

Vector Embeddings: How AI Understands Meaning

By Dorian Laurenceau

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

When you search "running shoes" and find results for "jogging sneakers," that's not keyword matching-it's semantic understanding powered by vector embeddings. Here's why this technology matters.


<!-- manual-insight -->

What Reddit stopped believing about embeddings

The embeddings space has quietly become one of the most over-engineered corners of AI infrastructure. If you read r/MachineLearning or r/LocalLLaMA in 2023, every other post was about picking the "best" embedding model from the MTEB leaderboard. Three years later, the tone has shifted: the benchmark is gamed, the top-of-leaderboard model rarely wins on your data, and a 1536-dimensional OpenAI embedding isn't measurably better than a 384-dim all-MiniLM-L6-v2 for most retrieval tasks.

The MTEB paper is worth skimming to understand what the benchmark actually measures — and what it doesn't (your specific domain vocabulary, your query distribution, your chunk sizes). The MTEB leaderboard on Hugging Face is still a useful starting point, but treat it as a shortlist generator, not a verdict.

What matters in practice:

  • Match the model to your query length. If your users type 3-word queries, a model trained on long-document retrieval will underperform a model trained on short-to-long asymmetric retrieval.
  • Normalize your embeddings. Cosine similarity on non-normalized vectors is a common bug that quietly halves retrieval quality. Most libraries do this; check yours.
  • Reranking beats bigger embeddings. A small embedding + a cross-encoder reranker (Cohere Rerank, bge-reranker) routinely outperforms switching to a larger embedding model, at a fraction of the compute.

If you're evaluating, run a 100-query labeled eval on your own data before picking a model. It takes an afternoon and saves months of debugging.


Learn AI — From Prompts to Agents

10 Free Interactive Guides120+ Hands-On Exercises100% Free

What Are Vector Embeddings?

A vector embedding is a list of numbers that represents the meaning of text (or images, audio, etc.) in a way computers can process.

Text to Numbers

"I love pizza" → [0.23, -0.45, 0.87, 0.12, ..., -0.33]
                  (typically 384-1536 numbers)

These numbers capture semantic meaning, not just characters.

Similar Meanings = Similar Numbers

"I love pizza"     → [0.23, -0.45, 0.87, ...]
"Pizza is great"   → [0.25, -0.42, 0.85, ...]  ← Very similar!
"I hate broccoli"  → [-0.18, 0.32, -0.22, ...]  ← Very different

Why Embeddings Matter

Traditional Search (Keyword Matching)

Search: "automobile"
Documents with "automobile" ✓
Documents with "car" ✗ (different word!)

Semantic Search (Embeddings)

Search: "automobile"
"automobile" ✓ (same word)
"car" ✓ (similar meaning)
"vehicle" ✓ (related concept)
"Tesla Model 3" ✓ (it's a car!)

Embeddings enable search by meaning, not just keywords.


How Embeddings Work (Simplified)

The Training Process

Embedding models learn from billions of text examples:

1. "The cat sat on the mat"
2. "Dogs are loyal pets"
3. "Machine learning uses algorithms"
... billions more

The model learns:
- "cat" and "dog" are somewhat related (both pets)
- "mat" and "rug" are very related
- "cat" and "algorithm" are unrelated

The Result: A Semantic Map

Imagine a vast space where every concept has a position:

Animals Cluster:

  • cat → kitten, feline
  • dog → puppy

Furniture Cluster:

  • mat → rug → carpet

Words with similar meanings cluster together.


Dimensions: What the Numbers Mean

Each number in an embedding captures some aspect of meaning:

Dimension 1: Maybe "living thing" vs "object"
Dimension 42: Maybe "positive" vs "negative" sentiment  
Dimension 256: Maybe "formal" vs "casual" language
...

No single dimension has a clear meaning-it's the combination that matters.

Why So Many Dimensions?

256 dimensions: Basic understanding
768 dimensions: Good for most tasks
1536 dimensions: Rich semantic capture

More dimensions = more nuanced understanding, but higher storage/compute cost.


Similarity: Measuring Closeness

Cosine Similarity

The standard way to compare embeddings:

Similarity("car", "automobile") = 0.94  (very similar)
Similarity("car", "banana") = 0.12     (unrelated)
Similarity("car", "vehicle") = 0.87    (related)

Scale: -1 (opposite) to 1 (identical meaning)

Why Cosine Works

It measures the angle between vectors, ignoring magnitude:

"I really really love cars" and "I love cars"
→ Same direction, different length
→ Cosine sees them as similar

Embeddings in Action: RAG Systems

RAG (Retrieval-Augmented Generation) uses embeddings at its core:

Step 1: Embed Your Documents

Document 1: "Our return policy allows 30-day returns..."
→ [0.12, -0.34, 0.56, ...]

Document 2: "Shipping takes 3-5 business days..."
→ [-0.23, 0.45, 0.11, ...]

... store all embeddings

Step 2: Embed the User Question

User: "How long do I have to return an item?"
→ [0.14, -0.31, 0.52, ...]  ← Similar to Document 1!

Step 3: Find Most Similar

Compare question embedding to all document embeddings:
- Document 1: 0.94 similarity ← Winner!
- Document 2: 0.23 similarity
- Document 3: 0.18 similarity

Return Document 1 to the LLM for answering.

ModelDimensionsBest For
OpenAI text-embedding-3-small1536General purpose, affordable
OpenAI text-embedding-3-large3072Highest quality
Cohere embed-v31024Multilingual
Google text-embedding-004768Google ecosystem
Open source (BGE, E5)384-1024Self-hosted, free

Limitations of Embeddings

1. Fixed at Creation Time

Embedding from 2023 doesn't know about events in 2024.
Need to re-embed with newer models for updates.

2. Context Window Limits

Most embedding models handle max 512-8000 tokens.
Long documents need chunking.

3. Same Words, Different Meanings

"Bank" (financial) vs "bank" (river)
Embeddings try to capture context, but it's imperfect.

4. Language/Cultural Bias

Models trained mainly on English perform worse on other languages.
Cultural concepts may not embed well.

Quick Summary

  1. Embeddings convert text to numbers representing meaning
  2. Similar meanings → similar number patterns
  3. Enable semantic search beyond keyword matching
  4. Foundation of RAG systems and AI search
  5. Trade-offs: dimensions, speed, quality, cost

Ready to Build with Embeddings?

This article covered the what and why of vector embeddings. But building production RAG systems requires understanding chunking, retrieval strategies, and integration patterns.

In our Module 5, RAG & Context Engineering, you'll learn:

  • Choosing the right embedding model
  • Document chunking strategies
  • Hybrid search (embeddings + keywords)
  • Vector database selection
  • Production RAG architecture

Explore Module 5: RAG & Context Engineering

GO DEEPER — FREE GUIDE

Module 5 — RAG (Retrieval-Augmented Generation)

Ground AI responses in your own documents and data sources.

D

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact
Published: January 30, 2026Updated: April 24, 2026
Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What are vector embeddings?+

Vector embeddings convert text into numerical arrays that capture meaning. Similar concepts have similar vectors. This lets AI understand that 'running shoes' and 'jogging sneakers' mean similar things.

How do embeddings enable semantic search?+

Embeddings map text to a mathematical space where distance equals similarity. Searching finds vectors close to your query, returning semantically similar results even without matching keywords.

What embedding models should I use?+

Popular choices: OpenAI text-embedding-3, Cohere embed-v3, open-source models like BGE or E5. Choice depends on accuracy needs, cost, and whether you need multilingual support.

How do embeddings work with RAG?+

In RAG, documents are split into chunks and embedded. When you ask a question, your query is embedded and compared to document vectors. The closest chunks are retrieved and sent to the LLM.