Chunking Strategies for Better RAG Retrieval With Qdrant

Last updated: May 11, 2026

Lately I have been exploring RAG systems in more detail, and while creating some local RAGs for experimentaton several things become clear – one of those, is the quality of Chunking affect immensely the quality of the entire RAG system

Looking at the entire pipeline its easy to focus on the LLM, the vector database, or the embedding model. Those are important pieces, of course. But before any of that can work well, we need to decide how the source material is split into smaller pieces.

As someone that worked many years with data and application integration, there is one mantra that still holds true. Garbage in garbage out.

In a RAG pipeline, the LLM does not receive the full knowledge base. It receives a small set of retrieved chunks. If those chunks are of poor quality (content wise) then the model will produce an output with equally poor quality.

So in this post I want to focus on chunking: what it is and which strategies are available.

What Is Chunking?

Chunking is the process of splitting data source content into smaller pieces before storing it in the vector database

Instead of embedding and storing the whole document as one giant block, we split it into chunks. Each chunk is embedded and stored in a vector database, usually with metadata that provides more information on that same chunk.

At query time, the user question is also embedded. The vector database then finds chunks with similar meaning and returns them as context for the LLM.

Retrieval strategies are equally important, but I already looked into some of those here, but without proper chunking, there is no retrieval strategy that can be successful.

Why Chunking Matters So Much

The vector database does not really understand the document structure, it see chunks and vectors.

  • If chunks are too large, they may contain too many topics at once.
  • If chunks are too small, they may lose the context needed to answer the question.
  • If chunks are split in the wrong place, important information may be separated.
  • If metadata is missing, retrieved chunks may be hard to trace or filter.

On top of this, if data has poor quality, all of the above potential issues will be exacerbated. This is why chunking is not just a preprocessing detail. It is an architectural choice.

In earlier RAG experiments, including the Star Trek RAG example, I noticed that retrieval quality was often less about the final generated answer and more about what the data retrieved in the first place.

When the retrieved chunks were good, the answer usually had a fighting chance. When the retrieved chunks were weak, the LLM could still produce a confident answer, but the answer was not grounded in the right context.

I did some research (not only ChatGPT based 🙂 ) in order to compile the next information, namely:

https://qdrant.tech/course/essentials/day-1/chunking-strategies

https://learning.oreilly.com/library/view/vector-databases/9781098177584

https://www.pinecone.io/learn/chunking-strategies

Strategy 1: Fixed-Size Chunking

Fixed-size chunking is the simplest approach. You split text every N characters or tokens. For example:

  • 500 tokens per chunk
  • 100 tokens overlap

The overlap helps preserve context across boundaries. Without overlap, an important explanation can be split between two chunks, and neither chunk is complete enough on its own.

This strategy is easy to implement and is often good enough for a first proof of concept.

The downside is that fixed size chunking ignores document structure. It can split headers from their content, break a paragraph in half, or separate a question from it’s answer.

Strategy 2: Paragraph or Section Based Chunking

A more natural approach is to split content by structure:

  • headings
  • sections
  • paragraphs
  • FAQ entries
  • documentation pages
  • support ticket comments

This should produce chunks that are easier to understand because they follow the way the author organized the content.

Most of this strategies are specific to the type of data. My Star Trek data set contains a lot of episode plots and dialogues which follow no specific structure, and would not be a good option for a section based chunking.

Strategy 3: Recursive Chunking

In this approach the splitter tries to preserve structure first, then becomes more aggressive only when needed.

  1. Split by major headings.
  2. If a section is too large, split by subheadings.
  3. If it is still too large, split by paragraphs.
  4. If it is still too large, split by sentences or token count.

This approach respects the document where possible, while still keeping chunks within a useful size.

Strategy 4: Semantic Chunking

Semantic chunking tries to split content based on meaning, instead of only looking at token count or headings, the system tries to detect where the topic changes. On paper this sounds great, but it comes with the consequence of increasing the complexity.

For example I have a Star Trek data set that contains episode scripts, in that case this strategy would be able to split that episode into meaningful logical pieces, instead of a blind split based in size or paragraphs.

Semantic chunking tries to keep each topic together, even if the formatting is messy.

This can improve retrieval quality, especially for long or badly structured documents. But it also adds complexity. It may require additional model calls, embedding comparisons, or more expensive ingestion logic.

For a production RAG system, semantic chunking can be worth evaluating. For a first implementation, it’s most likely a over burden.

Strategy 5: Parent Child Chunking

Parent child chunking is about creating chunks that are hierarchical connected.

  • create small child chunks for embedding and similarity search
  • keep a larger parent section connected to those child chunks
  • retrieve the matching child chunk
  • send the larger parent context to the LLM

This seems a very nice approach as it remains those connections between fragmented pieces (chunks)

The small child chunks help the vector database find the right area. The larger parent chunk gives the LLM enough surrounding context to answer properly.

I can imagine this being very useful for open ended questions that do require a bit more of data in order to fully provide something for the LLM to do its work.

Metadata Is Part of Chunking

Metadata is as crucial as the payload itself. It gives extra information about the chunk, such as:

  • title or category
  • page or section
  • heading
  • chunk number
  • created or updated date
  • document type
  • product or domain
  • language
  • access permissions

This is important also to make connections between chunks and equally to allow for filtering. For example, if we know that the document category for a Star Trek question is Deep Space Nine, we would be able to filter by those titles only.

This is the part where my Star Trek dataset fails, it only provides a title and text. In a real life use case the metadata would have to be much richer.

Qdrant Implementation (Chunking)

Given I now have Qdrant Vector Database at hand I will use it to test what it seems to me the best option for my data set – the Recursive Chunking. My data is already split into sections and logical topics, but the lenght of each content varies a lot.

SEPARATORS = ["\n\n", "\n", ". ", " ", ""]


def split_text_recursive(text, chunk_size, separators):
    text = text.strip()

    if len(text) <= chunk_size:
        return [text] if text else []

    separator = separators[-1]

    for index, candidate in enumerate(separators):
        if candidate == "" or candidate in text:
            separator = candidate
            next_separators = separators[index + 1:]
            break

    if separator == "":
        return [
            text[index:index + chunk_size]
            for index in range(0, len(text), chunk_size)
        ]

    parts = text.split(separator)
    chunks = []
    current = ""

    for part in parts:
        piece = part if not current else separator + part

        if len(current) + len(piece) <= chunk_size:
            current += piece
        else:
            if current:
                chunks.append(current.strip())

            if len(part) > chunk_size:
                chunks.extend(
                    split_text_recursive(part, chunk_size, next_separators)
                )
            else:
                current = part

    if current:
        chunks.append(current.strip())

    return chunks

Basically the idea is:

If text has paragraphs, split by paragraphs.
If not, try new lines.
If not, try “. “.
If not, try spaces.
If not, split by characters.

Main Giveaway

Chunking is one of the decisions that can decide whether a RAG system feels useful or unreliable.

The LLM may produce the final answer, and the vector database may handle the search, but the chunk defines what can be found in the first place.

Be the first to comment

Leave a Reply

Your email address will not be published.


*