Lately I have been exploring RAG systems in more detail, and while creating some local RAGs for experimentaton several things become clear – one of those, is the quality of Chunking affect immensely the quality of the entire RAG system
Looking at the entire pipeline its easy to focus on the LLM, the vector database, or the embedding model. Those are important pieces, of course. But before any of that can work well, we need to decide how the source material is split into smaller pieces.
As someone that worked many years with data and application integration, there is one mantra that still holds true. Garbage in garbage out.
In a RAG pipeline, the LLM does not receive the full knowledge base. It receives a small set of retrieved chunks. If those chunks are of poor quality (content wise) then the model will produce an output with equally poor quality.
So in this post I want to focus on chunking: what it is and which strategies are available.
What Is Chunking?
Chunking is the process of splitting data source content into smaller pieces before storing it in the vector database

Instead of embedding and storing the whole document as one giant block, we split it into chunks. Each chunk is embedded and stored in a vector database, usually with metadata that provides more information on that same chunk.
At query time, the user question is also embedded. The vector database then finds chunks with similar meaning and returns them as context for the LLM.
Retrieval strategies are equally important, but I already looked into some of those here, but without proper chunking, there is no retrieval strategy that can be successful.
Why Chunking Matters So Much
The vector database does not really understand the document structure, it see chunks and vectors.
- If chunks are too large, they may contain too many topics at once.
- If chunks are too small, they may lose the context needed to answer the question.
- If chunks are split in the wrong place, important information may be separated.
- If metadata is missing, retrieved chunks may be hard to trace or filter.
On top of this, if data has poor quality, all of the above potential issues will be exacerbated. This is why chunking is not just a preprocessing detail. It is an architectural choice.
In earlier RAG experiments, including the Star Trek RAG example, I noticed that retrieval quality was often less about the final generated answer and more about what the data retrieved in the first place.
When the retrieved chunks were good, the answer usually had a fighting chance. When the retrieved chunks were weak, the LLM could still produce a confident answer, but the answer was not grounded in the right context.
I did some research (not only ChatGPT based 🙂 ) in order to compile the next information, namely:
https://qdrant.tech/course/essentials/day-1/chunking-strategies
https://learning.oreilly.com/library/view/vector-databases/9781098177584
https://www.pinecone.io/learn/chunking-strategies
Strategy 1: Fixed-Size Chunking
Fixed-size chunking is the simplest approach. You split text every N characters or tokens. For example:
- 500 tokens per chunk
- 100 tokens overlap
The overlap helps preserve context across boundaries. Without overlap, an important explanation can be split between two chunks, and neither chunk is complete enough on its own.
This strategy is easy to implement and is often good enough for a first proof of concept.
The downside is that fixed size chunking ignores document structure. It can split headers from their content, break a paragraph in half, or separate a question from it’s answer.
Strategy 2: Paragraph or Section Based Chunking
A more natural approach is to split content by structure:
- headings
- sections
- paragraphs
- FAQ entries
- documentation pages
- support ticket comments
This should produce chunks that are easier to understand because they follow the way the author organized the content.
Most of this strategies are specific to the type of data. My Star Trek data set contains a lot of episode plots and dialogues which follow no specific structure, and would not be a good option for a section based chunking.
Strategy 3: Recursive Chunking
In this approach the splitter tries to preserve structure first, then becomes more aggressive only when needed.
- Split by major headings.
- If a section is too large, split by subheadings.
- If it is still too large, split by paragraphs.
- If it is still too large, split by sentences or token count.
This approach respects the document where possible, while still keeping chunks within a useful size.
Strategy 4: Semantic Chunking
Semantic chunking tries to split content based on meaning, instead of only looking at token count or headings, the system tries to detect where the topic changes. On paper this sounds great, but it comes with the consequence of increasing the complexity.
For example I have a Star Trek data set that contains episode scripts, in that case this strategy would be able to split that episode into meaningful logical pieces, instead of a blind split based in size or paragraphs.
Semantic chunking tries to keep each topic together, even if the formatting is messy.
This can improve retrieval quality, especially for long or badly structured documents. But it also adds complexity. It may require additional model calls, embedding comparisons, or more expensive ingestion logic.
For a production RAG system, semantic chunking can be worth evaluating. For a first implementation, it’s most likely a over burden.
Strategy 5: Parent Child Chunking
Parent child chunking is about creating chunks that are hierarchical connected.
- create small child chunks for embedding and similarity search
- keep a larger parent section connected to those child chunks
- retrieve the matching child chunk
- send the larger parent context to the LLM
This seems a very nice approach as it remains those connections between fragmented pieces (chunks)
The small child chunks help the vector database find the right area. The larger parent chunk gives the LLM enough surrounding context to answer properly.
I can imagine this being very useful for open ended questions that do require a bit more of data in order to fully provide something for the LLM to do its work.
Metadata Is Part of Chunking
Metadata is as crucial as the payload itself. It gives extra information about the chunk, such as:
- title or category
- page or section
- heading
- chunk number
- created or updated date
- document type
- product or domain
- language
- access permissions
This is important also to make connections between chunks and equally to allow for filtering. For example, if we know that the document category for a Star Trek question is Deep Space Nine, we would be able to filter by those titles only.
This is the part where my Star Trek dataset fails, it only provides a title and text. In a real life use case the metadata would have to be much richer.
Qdrant Implementation (Chunking)
Given I now have Qdrant Vector Database at hand I will use it to test what it seems to me the best option for my data set – the Recursive Chunking. My data is already split into sections and logical topics, but the lenght of each content varies a lot.
SEPARATORS = ["\n\n", "\n", ". ", " ", ""]
def split_text_recursive(text, chunk_size, separators):
text = text.strip()
if len(text) <= chunk_size:
return [text] if text else []
separator = separators[-1]
for index, candidate in enumerate(separators):
if candidate == "" or candidate in text:
separator = candidate
next_separators = separators[index + 1:]
break
if separator == "":
return [
text[index:index + chunk_size]
for index in range(0, len(text), chunk_size)
]
parts = text.split(separator)
chunks = []
current = ""
for part in parts:
piece = part if not current else separator + part
if len(current) + len(piece) <= chunk_size:
current += piece
else:
if current:
chunks.append(current.strip())
if len(part) > chunk_size:
chunks.extend(
split_text_recursive(part, chunk_size, next_separators)
)
else:
current = part
if current:
chunks.append(current.strip())
return chunks
Basically the idea is:
If text has paragraphs, split by paragraphs.
If not, try new lines.
If not, try “. “.
If not, try spaces.
If not, split by characters.
Main Giveaway
Chunking is one of the decisions that can decide whether a RAG system feels useful or unreliable.
The LLM may produce the final answer, and the vector database may handle the search, but the chunk defines what can be found in the first place.
Leave a Reply