Building a Practical RAG Pipeline with LangChain

Last updated: May 17, 2026

RAG is one of the best solutions since the proliferation of LLMs (The next one being Agents). Its simple, and yet very powerful.

The basic idea is clear enough, we retrieve relevant data, give it to an LLM, and ask the model to answer from it. But the quality of the final answer depends a lot on the decisions made across the entire end2end flow (or pipeline)

  • How did we load the source material?
  • How did we split it?
  • What metadata did we keep?
  • Which embedding model did we use?
  • How do we retrieve chunks?
  • Is the retrieved context enough?

I have experimented with several of the previous steps, but all on isolation, whether from the angle of the Vector Database, or the angle of the available strategies for chunking and retrieval.

This time i will focus on doing all with Langchain which is useful here because it gives us a common structure for these moving parts. It does not magically make a RAG system good, but it does make it easier to assemble, observe and improve the pieces.

What Is The Goal

A typical RAG application has two separate parts.

The first one is indexing. This usually happens before the user asks a question.

  1. Load source documents.
  2. Split them into chunks.
  3. Create embeddings.
  4. Store those embeddings in a vector database.

The second part is retrieval and generation. This happens at runtime.

  1. Receive a user question.
  2. Search for relevant chunks.
  3. Add those chunks to the prompt.
  4. Ask the LLM to answer from the provided context.
  5. Return the answer.

Where LangChain Fits

LangChain provides building blocks around the main parts of a RAG system

  • document loaders for bringing data in
  • text splitters for chunking content
  • embedding integrations for turning text into vectors
  • vector store integrations for storing and searching those vectors
  • retrievers for fetching relevant documents
  • agents, tools, and middleware for controlling how retrieval is used
  • LangSmith for tracing and evaluation
  • LangGraph when the workflow needs more explicit control

All the bold parts are integral parts of what LangChain offers, most of them part of the Core SDK

For example, we can start with local vector store while testing locally (chroamdb). Then later, we can move to Qdrant, Pinecone, or another vector database. The RAG design should not depend on the first storage option you tried. This kind of modulatory reminds me of the good old SOA, or the more recent microservices architecture. It’s all lego blocks.

The same principle applies to models. We can start with one chat model and one embedding model, then change providers when some more concrete requirements appear.

Step 1: Load the Documents

Every RAG system starts with source material.

That source might be:

  • documentation pages
  • PDFs
  • Confluence pages
  • support tickets
  • product manuals
  • JSON exports
  • internal knowledge base articles
  • database records

LangChain document loaders turn those sources into Document objects. A document normally contains the text plus metadata. Metadata is a crucial element as it provides richer information to the data itself, and can be later useful for filtering.

To continue with the StarTrek theme i got 13 pdf’s with the novelization of The Original Series episodes. I will this set as my main data set going forward for this exercise. I got it from here. There are several files so I will need to iterate and read all of them, and on top of that these are stories, so data may be scattered across larger portions of text, which is something to be taken in consideration

Upon further inspection of LangChain documentation we can find https://docs.langchain.com/oss/python/integrations/document_loaders/pypdfloader



from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader

PDF_DIR = Path("<my path>/python_rag/langchain")
pdf_files = sorted(PDF_DIR.glob("Star Trek *.pdf"))

all_documents = []

for pdf_path in pdf_files:
    loader = PyPDFLoader(str(pdf_path), mode="page")
    docs = loader.load()

    for doc in docs:
        doc.metadata["book"] = pdf_path.stem

    all_documents.extend(
        doc for doc in docs if doc.page_content.strip()
    )

print(f"Loaded {len(all_documents)} text pages from {len(pdf_files)} PDFs")

Step 2: Split Content into Chunks

I covered several chinking strategies on a previous post.

I will start with the recursive strategy which It recursively splits text using a hierarchical list of separators (e.g., paragraphs, sentences, words) until each chunk reaches the target size – Then later I can play with the target size and see what works best.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1200,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", "? ", "! ", " ", ""],
)

chunks = text_splitter.split_documents(all_documents)

print(f"Created {len(chunks)} chunks")
print(chunks[0].metadata)
print(chunks[0].page_content[:800])

Step 3: Store Embeddings in a Vector Database

Once the documents are split, each chunk is embedded.

For this exercise, and given that I already have Ollama with llama3.1 locally, the logic approach is to use it’s embedding model mxbai-embed-large. And to make thing simple, i ll also use local chromadb.

from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

CHROMA_DIR = "<my path>/python_rag/langchain/chroma_tos_novels"

embeddings = OllamaEmbeddings(model="mxbai-embed-large")

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory=CHROMA_DIR,
    collection_name="tos_novelizations",
)

print("Chunks in vector store:", vectorstore._collection.count())

Step 4: Retrieval

In a similar manner as for chunking, there are several strategies for retrieval – but let’s keep this simple for now. Only if the answers are not of decent quality, then I will revisit some of this strategies.

Lets test the retriever first.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5},
)

question = "Who is the science officer?"

results = retriever.invoke(question)

for i, doc in enumerate(results, start=1):
    print(f"\n--- Result {i} ---")
    print(doc.metadata)
    print(doc.page_content[:700])

So far, the retrieved chunks seem to be picking up relevant content

Testing Answer Chain

Ok, in order for this to make sense we need a proper sequence, including prompt injection, with context, question, chunks, instructions and guardrails.

from langchain_ollama import ChatOllama
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOllama(model="llama3.1", temperature=0)

prompt = ChatPromptTemplate.from_template("""
You answer questions about Star Trek TOS novelizations using only the provided context.

Context:
{context}

Question:
{question}

Instructions:
- Answer only from the context.
- If the context does not contain the answer, say you do not have enough information.
- Mention the book/page evidence when useful.

Answer:
""")

def format_docs(docs):
    return "\n\n".join(
        f"[{doc.metadata.get('book')} | page {doc.metadata.get('page')}]\n{doc.page_content}"
        for doc in docs
    )

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

And now the final proof

rag_chain.invoke("What happens in the episode where Kirk meets alien lizards?")

Yep, that seems about it, one of the most famous Star Trek episodes.

Final Thoughts

LangChain is a strong toolkit for building RAG applications because it gives structure to the messy parts. loading documents, splitting content, storing embeddings, retrieving context, orchestrating model calls, and tracing what happened (With Lang Smith – but that is for another post)

But the framework does not define the architecture. A good RAG system still needs careful chunking, useful metadata, proper retrieval, quality prompting, and evaluation of the results.

Be the first to comment

Leave a Reply

Your email address will not be published.


*