January 30, 20268 MIN READ

Chunking Strategies for RAG: Size Matters

By Dorian Laurenceau

Part ofModule 5 — RAG (Retrieval-Augmented Generation)→

📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.

You've built a RAG system, but the AI keeps returning irrelevant answers. The problem might not be your model-it might be how you're chunking your documents.

Chunking in RAG: why it's 60% of your retrieval quality

Chunking is the unglamorous part of RAG that determines whether your system works. The threads on r/LangChain, r/LlamaIndex, and r/MachineLearning repeatedly surface the same pattern: teams blame the embedding model or the LLM, when the real issue is how they split documents.

What chunking strategies actually exist and when to use each:

→Fixed-size chunking (e.g., 512 or 1024 tokens). Simple, predictable, and often the wrong choice. Works when content is uniform and self-contained; breaks for anything with structure (headers, code blocks, tables).
→Recursive character splitting. The LangChain default. Respects newlines, then sentences, then words. A sensible baseline for prose.
→Semantic chunking. Splits at semantically-meaningful boundaries using embedding similarity. The LlamaIndex SemanticSplitter documentation describes the approach. Better for heterogeneous content; more expensive at ingestion.
→Structure-aware chunking. Uses document structure (markdown headers, HTML sections, code AST) to split at natural boundaries. The gold standard for structured content.
→Sliding-window with overlap. Each chunk includes the last N tokens of the previous chunk. Reduces boundary-loss but increases storage and some retrieval noise.
→Proposition-level chunking. Rewrite text as atomic claims, embed each claim. Expensive but produces measurably better retrieval on factual-QA benchmarks.
→Anthropic's contextual retrieval. Prepend a short chunk-specific summary generated by an LLM. The Anthropic contextual retrieval post documents the technique and measured gains.

What practitioners have settled on:

→Chunk size matters less than chunk boundaries. A 512-token chunk that ends mid-sentence retrieves worse than a 300- or 800-token chunk that ends at a paragraph break.
→Metadata matters. Include document title, section, source URL in the chunk. Retrieval systems rank better when metadata is searchable.
→Hybrid retrieval beats pure vector. BM25 + dense retrieval + reranking (Cohere Rerank, Voyage rerank, Jina reranker) outperforms any single approach.
→Evaluate retrieval separately from generation. Many "RAG is bad" complaints are "my retriever returned garbage" complaints. Measure retrieval quality (MRR, recall@k) before touching the LLM.
→Re-chunk when models change. Different embedding models have different optimal chunk sizes. What worked for OpenAI ada-002 may not be optimal for Voyage or Cohere.

What's still painful:

→Code chunks. AST-aware splitting helps but isn't universal. Cross-file dependencies break retrieval that treats each file as independent.
→Tables and figures. Hardest content class; serialised-to-text tables retrieve poorly. Vision-language models handle some of this but at cost.
→Evolving documents. Re-chunking and re-embedding on every update is expensive; delta chunking is underdeveloped.

The honest framing: chunking is a system-design decision, not a hyperparameter. Teams that treat chunking as a first-class concern — with evaluation, iteration, and domain-specific tuning — build RAG systems that work. Teams that use the default 512-token recursive splitter and blame the LLM are unaware of where their real quality ceiling is.

Learn AI — From Prompts to Agents

10 Free Interactive Guides120+ Hands-On Exercises100% Free

Explore All Guides

What Is Chunking?

Chunking is the process of breaking large documents into smaller pieces for storage and retrieval in a RAG system.

Why We Chunk

Problem:
- Your document: 50,000 tokens
- Context window: 8,000 tokens
- Embedding models: Max 512 tokens

Solution:
- Split into ~100 chunks of 500 tokens each
- Embed and store each chunk
- Retrieve only relevant chunks

You can't feed entire documents to most AI systems-chunking makes them manageable.

Why Chunking Strategy Matters

Bad Chunking

Chunk 1: "...increased by 15%. The new policy"
Chunk 2: "requires all employees to submit forms"
Chunk 3: "by Friday. Safety regulations mandate"

Chunks split mid-sentence. Context lost. Retrieval fails.

Good Chunking

Chunk 1: "Revenue increased by 15% in Q3 2024."
Chunk 2: "The new expense policy requires all employees 
          to submit reimbursement forms by Friday each week."
Chunk 3: "Safety regulations mandate quarterly equipment 
          inspections for all manufacturing facilities."

Complete thoughts. Clear context. Effective retrieval.

The Chunking Trade-Off

Small Chunks (100-200 tokens)

✅ Precise retrieval
✅ Less noise in results
❌ May lose context
❌ More chunks to search

Large Chunks (1000+ tokens)

✅ More context preserved
✅ Fewer chunks to manage
❌ More noise in results
❌ May exceed model limits

The Sweet Spot

For most use cases: 300-500 tokens per chunk with 50-100 token overlap

5 Chunking Strategies

1. Fixed-Size Chunking

Split by character/token count:

Every 500 tokens → new chunk
Overlap: 50 tokens between chunks

Simple but blunt. May cut mid-sentence.

Best for: Quick prototypes, uniform documents

2. Sentence-Based Chunking

Split on sentence boundaries:

Chunk until reaching ~500 tokens
Always end on a complete sentence

Respects natural language boundaries.

Best for: General text, articles, documentation

3. Paragraph-Based Chunking

Keep paragraphs together:

Each paragraph = one chunk (if reasonable size)
Combine small paragraphs
Split very large paragraphs

Preserves topical coherence.

Best for: Well-structured documents, reports

4. Semantic Chunking

Split based on meaning changes:

Use AI to detect topic shifts
Start new chunk when topic changes

Most accurate but slower/costlier.

Best for: Complex documents, mixed content

5. Document Structure Chunking

Follow document hierarchy:

Respect headers, sections, lists
Each H2 section = logical chunk
Tables kept intact

Leverages author's organization.

Best for: Technical docs, manuals, structured content

The Overlap Question

Why Overlap?

Without overlap:
Chunk 1: "...the company achieved record sales."
Chunk 2: "This was primarily due to the new product line."

The connection between "record sales" and "new product line" is lost.

With overlap (last 2 sentences repeated):
Chunk 1: "...the company achieved record sales."
Chunk 2: "...achieved record sales. This was primarily 
          due to the new product line."

Context preserved across chunk boundaries.

How Much Overlap?

10-15% of chunk size is typical
Example: 500 token chunks, 50-75 token overlap

Too little: Context breaks
Too much: Wasted storage, duplicate results

Chunk Size by Use Case

Use Case	Recommended Size	Why
Q&A / Factoid	200-300 tokens	Precise answers
General chat	400-500 tokens	Balanced context
Summarization	800-1000 tokens	More source material
Legal/Technical	300-400 tokens	Specific clauses
Creative content	500-800 tokens	Flow and context

Common Chunking Mistakes

1. One Size Fits All

Using same chunk size for FAQs and legal contracts ❌
Different content types need different strategies

2. Ignoring Structure

Splitting a table across chunks ❌
Separating a heading from its content ❌
Breaking up a code block ❌

3. No Metadata

Chunk without knowing its source document ❌
No idea which section it came from ❌

Always preserve: source, page, section, date

4. Never Testing

Set chunk size once, never evaluate ❌
Retrieval quality varies—test and iterate

Metadata: The Secret Weapon

Good chunks include context:

{
  "text": "The return policy allows 30 days...",
  "metadata": {
    "source": "customer-policies.pdf",
    "section": "Returns & Refunds",
    "page": 12,
    "last_updated": "2024-06-15"
  }
}

This enables:

→Filtering by source
→Citing specific pages
→Showing freshness
→Debugging retrieval issues

Evaluation: Is Your Chunking Working?

Test with Real Queries

Question: "What's the vacation policy for new employees?"

Check:
1. Is the right chunk retrieved?
2. Does it contain the complete answer?
3. Is there too much irrelevant content?

Metrics to Track

Retrieval precision: % of retrieved chunks that are relevant
Retrieval recall: % of relevant chunks that are retrieved
Answer quality: Does the LLM produce correct answers?

Key Takeaways

→Chunking = splitting documents for RAG retrieval
→Size matters: 300-500 tokens typical, adjust per use case
→Strategy matters: Fixed, sentence, paragraph, semantic, structural
→Overlap preserves context across boundaries (10-15%)
→Metadata makes chunks traceable and filterable

Ready to Master RAG?

This article covered the what and why of chunking strategies. But production RAG systems require end-to-end design including embedding selection, retrieval tuning, and integration.

In our Module 5, RAG & Context Engineering, you'll learn:

→Complete RAG architecture design
→Advanced chunking implementations
→Hybrid search strategies
→Retrieval evaluation and optimization
→Production deployment patterns

→ Explore Module 5: RAG & Context Engineering

GO DEEPER — FREE GUIDE

Module 5 — RAG (Retrieval-Augmented Generation)

Ground AI responses in your own documents and data sources.

Explore the Module

Dorian Laurenceau

Full-Stack Developer & Learning Designer

Full-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.

Prompt EngineeringLLMsFull-Stack DevelopmentLearning DesignReact

Published: January 30, 2026Updated: April 24, 2026

Newsletter

Weekly AI Insights

Tools, techniques & news — curated for AI practitioners. Free, no spam.

Free, no spam. Unsubscribe anytime.

FAQ

What is the optimal chunk size for RAG?+

Typically 200-500 tokens works best. Too small loses context; too large dilutes relevance. The ideal size depends on your content type, embedding model, and query patterns. Test to find optimal.

What chunking strategies exist?+

Fixed-size (every N tokens), semantic (by meaning boundaries), sentence-based, paragraph-based, recursive (split hierarchically), and document-specific (respect headers/sections).

Should chunks overlap?+

Yes, usually 10-20% overlap. Overlap ensures ideas split across chunk boundaries are still captured. Without overlap, you might miss relevant content that falls at chunk edges.

How does chunking affect RAG accuracy?+

Chunking is often the biggest factor in RAG quality. Poor chunking means irrelevant or incomplete retrieval. Good chunking ensures the AI gets the right context to answer accurately.