Chunking Strategies for RAG: Size Matters
By Dorian Laurenceau
📅 Last reviewed: April 24, 2026. Updated with April 2026 findings and community feedback.
You've built a RAG system, but the AI keeps returning irrelevant answers. The problem might not be your model-it might be how you're chunking your documents.
<!-- manual-insight -->
Chunking in RAG: why it's 60% of your retrieval quality
Chunking is the unglamorous part of RAG that determines whether your system works. The threads on r/LangChain, r/LlamaIndex, and r/MachineLearning repeatedly surface the same pattern: teams blame the embedding model or the LLM, when the real issue is how they split documents.
What chunking strategies actually exist and when to use each:
- →Fixed-size chunking (e.g., 512 or 1024 tokens). Simple, predictable, and often the wrong choice. Works when content is uniform and self-contained; breaks for anything with structure (headers, code blocks, tables).
- →Recursive character splitting. The LangChain default. Respects newlines, then sentences, then words. A sensible baseline for prose.
- →Semantic chunking. Splits at semantically-meaningful boundaries using embedding similarity. The LlamaIndex SemanticSplitter documentation describes the approach. Better for heterogeneous content; more expensive at ingestion.
- →Structure-aware chunking. Uses document structure (markdown headers, HTML sections, code AST) to split at natural boundaries. The gold standard for structured content.
- →Sliding-window with overlap. Each chunk includes the last N tokens of the previous chunk. Reduces boundary-loss but increases storage and some retrieval noise.
- →Proposition-level chunking. Rewrite text as atomic claims, embed each claim. Expensive but produces measurably better retrieval on factual-QA benchmarks.
- →Anthropic's contextual retrieval. Prepend a short chunk-specific summary generated by an LLM. The Anthropic contextual retrieval post documents the technique and measured gains.
What practitioners have settled on:
- →Chunk size matters less than chunk boundaries. A 512-token chunk that ends mid-sentence retrieves worse than a 300- or 800-token chunk that ends at a paragraph break.
- →Metadata matters. Include document title, section, source URL in the chunk. Retrieval systems rank better when metadata is searchable.
- →Hybrid retrieval beats pure vector. BM25 + dense retrieval + reranking (Cohere Rerank, Voyage rerank, Jina reranker) outperforms any single approach.
- →Evaluate retrieval separately from generation. Many "RAG is bad" complaints are "my retriever returned garbage" complaints. Measure retrieval quality (MRR, recall@k) before touching the LLM.
- →Re-chunk when models change. Different embedding models have different optimal chunk sizes. What worked for OpenAI ada-002 may not be optimal for Voyage or Cohere.
What's still painful:
- →Code chunks. AST-aware splitting helps but isn't universal. Cross-file dependencies break retrieval that treats each file as independent.
- →Tables and figures. Hardest content class; serialised-to-text tables retrieve poorly. Vision-language models handle some of this but at cost.
- →Evolving documents. Re-chunking and re-embedding on every update is expensive; delta chunking is underdeveloped.
The honest framing: chunking is a system-design decision, not a hyperparameter. Teams that treat chunking as a first-class concern — with evaluation, iteration, and domain-specific tuning — build RAG systems that work. Teams that use the default 512-token recursive splitter and blame the LLM are unaware of where their real quality ceiling is.
Learn AI — From Prompts to Agents
What Is Chunking?
Chunking is the process of breaking large documents into smaller pieces for storage and retrieval in a RAG system.
Why We Chunk
Problem:
- Your document: 50,000 tokens
- Context window: 8,000 tokens
- Embedding models: Max 512 tokens
Solution:
- Split into ~100 chunks of 500 tokens each
- Embed and store each chunk
- Retrieve only relevant chunks
You can't feed entire documents to most AI systems-chunking makes them manageable.
Why Chunking Strategy Matters
Bad Chunking
Chunk 1: "...increased by 15%. The new policy"
Chunk 2: "requires all employees to submit forms"
Chunk 3: "by Friday. Safety regulations mandate"
Chunks split mid-sentence. Context lost. Retrieval fails.
Good Chunking
Chunk 1: "Revenue increased by 15% in Q3 2024."
Chunk 2: "The new expense policy requires all employees
to submit reimbursement forms by Friday each week."
Chunk 3: "Safety regulations mandate quarterly equipment
inspections for all manufacturing facilities."
Complete thoughts. Clear context. Effective retrieval.
The Chunking Trade-Off
Small Chunks (100-200 tokens)
✅ Precise retrieval
✅ Less noise in results
❌ May lose context
❌ More chunks to search
Large Chunks (1000+ tokens)
✅ More context preserved
✅ Fewer chunks to manage
❌ More noise in results
❌ May exceed model limits
The Sweet Spot
For most use cases: 300-500 tokens per chunk with 50-100 token overlap
5 Chunking Strategies
1. Fixed-Size Chunking
Split by character/token count:
Every 500 tokens → new chunk
Overlap: 50 tokens between chunks
Simple but blunt. May cut mid-sentence.
Best for: Quick prototypes, uniform documents
2. Sentence-Based Chunking
Split on sentence boundaries:
Chunk until reaching ~500 tokens
Always end on a complete sentence
Respects natural language boundaries.
Best for: General text, articles, documentation
3. Paragraph-Based Chunking
Keep paragraphs together:
Each paragraph = one chunk (if reasonable size)
Combine small paragraphs
Split very large paragraphs
Preserves topical coherence.
Best for: Well-structured documents, reports
4. Semantic Chunking
Split based on meaning changes:
Use AI to detect topic shifts
Start new chunk when topic changes
Most accurate but slower/costlier.
Best for: Complex documents, mixed content
5. Document Structure Chunking
Follow document hierarchy:
Respect headers, sections, lists
Each H2 section = logical chunk
Tables kept intact
Leverages author's organization.
Best for: Technical docs, manuals, structured content
The Overlap Question
Why Overlap?
Without overlap:
Chunk 1: "...the company achieved record sales."
Chunk 2: "This was primarily due to the new product line."
The connection between "record sales" and "new product line" is lost.
With overlap (last 2 sentences repeated):
Chunk 1: "...the company achieved record sales."
Chunk 2: "...achieved record sales. This was primarily
due to the new product line."
Context preserved across chunk boundaries.
How Much Overlap?
10-15% of chunk size is typical
Example: 500 token chunks, 50-75 token overlap
Too little: Context breaks
Too much: Wasted storage, duplicate results
Chunk Size by Use Case
| Use Case | Recommended Size | Why |
|---|---|---|
| Q&A / Factoid | 200-300 tokens | Precise answers |
| General chat | 400-500 tokens | Balanced context |
| Summarization | 800-1000 tokens | More source material |
| Legal/Technical | 300-400 tokens | Specific clauses |
| Creative content | 500-800 tokens | Flow and context |
Common Chunking Mistakes
1. One Size Fits All
Using same chunk size for FAQs and legal contracts ❌
Different content types need different strategies
2. Ignoring Structure
Splitting a table across chunks ❌
Separating a heading from its content ❌
Breaking up a code block ❌
3. No Metadata
Chunk without knowing its source document ❌
No idea which section it came from ❌
Always preserve: source, page, section, date
4. Never Testing
Set chunk size once, never evaluate ❌
Retrieval quality varies—test and iterate
Metadata: The Secret Weapon
Good chunks include context:
{
"text": "The return policy allows 30 days...",
"metadata": {
"source": "customer-policies.pdf",
"section": "Returns & Refunds",
"page": 12,
"last_updated": "2024-06-15"
}
}
This enables:
- →Filtering by source
- →Citing specific pages
- →Showing freshness
- →Debugging retrieval issues
Evaluation: Is Your Chunking Working?
Test with Real Queries
Question: "What's the vacation policy for new employees?"
Check:
1. Is the right chunk retrieved?
2. Does it contain the complete answer?
3. Is there too much irrelevant content?
Metrics to Track
Retrieval precision: % of retrieved chunks that are relevant
Retrieval recall: % of relevant chunks that are retrieved
Answer quality: Does the LLM produce correct answers?
Key Takeaways
- →Chunking = splitting documents for RAG retrieval
- →Size matters: 300-500 tokens typical, adjust per use case
- →Strategy matters: Fixed, sentence, paragraph, semantic, structural
- →Overlap preserves context across boundaries (10-15%)
- →Metadata makes chunks traceable and filterable
Ready to Master RAG?
This article covered the what and why of chunking strategies. But production RAG systems require end-to-end design including embedding selection, retrieval tuning, and integration.
In our Module 5, RAG & Context Engineering, you'll learn:
- →Complete RAG architecture design
- →Advanced chunking implementations
- →Hybrid search strategies
- →Retrieval evaluation and optimization
- →Production deployment patterns
Module 5 — RAG (Retrieval-Augmented Generation)
Ground AI responses in your own documents and data sources.
Dorian Laurenceau
Full-Stack Developer & Learning DesignerFull-stack web developer and learning designer. I spent 4 years as a freelance full-stack developer and 4 years teaching React, JavaScript, HTML/CSS and WordPress to adult learners. Today I design learning paths in web development and AI, grounded in learning science. I founded learn-prompting.fr to make AI practical and accessible, and built the Bluff app to gamify political transparency.
Weekly AI Insights
Tools, techniques & news — curated for AI practitioners. Free, no spam.
Free, no spam. Unsubscribe anytime.
→Related Articles
FAQ
What is the optimal chunk size for RAG?+
Typically 200-500 tokens works best. Too small loses context; too large dilutes relevance. The ideal size depends on your content type, embedding model, and query patterns. Test to find optimal.
What chunking strategies exist?+
Fixed-size (every N tokens), semantic (by meaning boundaries), sentence-based, paragraph-based, recursive (split hierarchically), and document-specific (respect headers/sections).
Should chunks overlap?+
Yes, usually 10-20% overlap. Overlap ensures ideas split across chunk boundaries are still captured. Without overlap, you might miss relevant content that falls at chunk edges.
How does chunking affect RAG accuracy?+
Chunking is often the biggest factor in RAG quality. Poor chunking means irrelevant or incomplete retrieval. Good chunking ensures the AI gets the right context to answer accurately.