Building a PDF Chatbot with LangChain, Ollama, and Chroma: A Step-by-Step Tutorial – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by James

If you’ve ever wanted to create an interactive chatbot that can dive into the contents of a PDF and answer your questions intelligently, you’re in the right place. In this tutorial, we’ll walk through building a Streamlit-based app that leverages LangChain for conversational AI, Ollama for local model embeddings, and Chroma as a vector database to handle document retrieval. This setup allows you to upload a PDF, process it, and chat with it like it’s your personal knowledge base.

By the end of this post, you’ll have a fully functional app that runs locally (or on a server) and can handle queries about any PDF you throw at it. We’ll break down the provided code step by step, explain the key components, and show you how to get it running. Plus, I’ve embedded a YouTube video tutorial below for a visual walkthrough—check it out if you prefer following along with code demos.

Watch the Tutorial Video

For a hands-on demonstration of building and running this app, watch this YouTube video:

(Click the thumbnail to play the video on YouTube.)

Why Build This PDF Chatbot?

Imagine uploading a research paper, a user manual, or a lengthy report, and then asking natural-language questions like “What are the key findings?” or “Explain section 3 in simple terms.” This app makes that possible without needing cloud services (though it supports options like GPT-3.5 if you want). It’s powered by open-source tools, making it cost-effective and privacy-focused. LangChain handles the orchestration, Ollama provides embeddings and models, and Chroma stores vectorized chunks of your PDF for quick retrieval.

This project is great for beginners in AI app development, as it combines web interfaces (via Streamlit), document processing, and retrieval-augmented generation (RAG). If you’re familiar with Python, you can have this up and running in under an hour.

Prerequisites

Before we dive in, make sure you have these set up: Python 3.8+, pip for installing packages, and a basic understanding of virtual environments. You’ll need to install the following libraries—run this in your terminal:

pip install streamlit langchain langchain-ollama langchain-community chromadb python-dotenv pypdf

If you’re using OpenAI models, add pip install langchain-openai and set up an API key in a .env file. For local models, ensure Ollama is installed and running (download it from ollama.com). We’ll use models like Llama 3.2 or Qwen 2.5, which you can pull via ollama pull <model-name>.

Create a .env file in your project root with any necessary keys, like OPENAI_API_KEY=your-key-here.

Step-by-Step Code Breakdown

The code is structured as a single app.py file for simplicity. Let’s dissect it function by function, explaining what each part does and why it’s there. I’ll include relevant code snippets to make it easier to follow along.

1. Imports and Environment Setup

We start by importing the necessary modules: Streamlit for the UI, LangChain components for AI and retrieval, and helpers like dotenv for loading environment variables. The load_dotenv() call pulls in secrets like API keys.

Temporary files are handled with tempfile and os for processing uploaded PDFs without cluttering your disk.

Here’s the import section:

import streamlit as st
from dotenv import load_dotenv
from langchain.schema import HumanMessage, AIMessage, SystemMessage
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.chains import RetrievalQA
import os
import tempfile

# Load environment variables from a .env file
load_dotenv()

2. Page Configuration

The configure_page() function sets up the Streamlit app’s look and feel. It defines the title (“Chat with Your PDF using LangChain, Ollama, and Chroma”), adds an icon, and enables a wide layout for better usability. There’s also an expander to peek at the session state—handy for debugging.

def configure_page():
    st.set_page_config(
        page_title="PDF Chat with LangChain, Ollama, and Chroma",
        page_icon="🤖",
        layout="wide",
        initial_sidebar_state="expanded",
    )
    st.title("📄🤖 Chat with Your PDF using LangChain, Ollama, and Chroma")
    with st.expander("Check State"):
        st.write(st.session_state)

3. Sidebar Handling

The sidebar is where users interact with settings. In handle_sidebar(), we let users select a model (e.g., “llama3.2” for local Ollama or “gpt-3.5-turbo” for OpenAI). It stores this in Streamlit’s session state for persistence across reruns.

Next, there’s a file uploader for PDFs. When a file is uploaded, the app processes it: loads the PDF, splits it into chunks, generates embeddings with Ollama, and creates a Chroma vector store. This is cached for efficiency. Buttons to clear the chat or cache ensure a fresh start when needed.

Key helper functions here include:

get_chat_model(model_name): Returns a chat model instance, cached to avoid recreating it unnecessarily.

@st.cache_resource
def get_chat_model(model_name):
    if model_name == "gpt-3.5-turbo":
        from langchain_openai import ChatOpenAI
        return ChatOpenAI(
            api_key=os.getenv("OPENAI_API_KEY"),
            model=model_name,
            streaming=True,
        )
    return ChatOllama(model=model_name, streaming=True)

get_embeddings(): Provides Ollama embeddings (using “mxbai-embed-large” for high-quality vector representations).

@st.cache_resource
def get_embeddings():
    return OllamaEmbeddings(
        model="mxbai-embed-large"
    )

load_pdf(uploaded_file): Saves the upload temporarily and loads it with PyPDFLoader.

def load_pdf(uploaded_file):
    with tempfile.NamedTemporaryFile(delete=False, suffix=".pdf") as tmp_file:
        tmp_file.write(uploaded_file.read())
        tmp_file_path = tmp_file.name
    loader = PyPDFLoader(tmp_file_path)
    documents = loader.load()
    os.unlink(tmp_file_path)  # Clean up the temporary file
    return documents

split_text(documents): Breaks the PDF into manageable chunks (1000 characters with overlap) using RecursiveCharacterTextSplitter.

def split_text(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    texts = text_splitter.split_documents(documents)
    return texts

create_vector_store(texts, embeddings): Builds and persists a Chroma database in a temp directory.

def create_vector_store(texts, embeddings):
    # Define the directory where Chroma will store its data
    chroma_persist_directory = os.path.join(tempfile.gettempdir(), "chroma_db")

    # Initialize Chroma vector store
    vector_store = Chroma.from_documents(
        documents=texts,
        embedding=embeddings,
        persist_directory=chroma_persist_directory
    )
    # Persist the vector store to disk
    vector_store.persist()
    return vector_store

4. Displaying Chat Messages

display_chat_messages() loops through the session state messages and renders them in chat bubbles—user messages on one side, AI responses on the other. It skips the initial system message (“You are a helpful AI assistant.”) which sets the AI’s behavior.

def display_chat_messages():
    for message in st.session_state.messages[1:]:
        if isinstance(message, HumanMessage):  # Display user messages
            with st.chat_message("user"):
                st.write(message.content)
        elif isinstance(message, AIMessage):  # Display AI responses
            with st.chat_message("assistant"):
                st.write(message.content)

5. Handling User Input

In handle_user_input(chat_model, retriever), we capture user prompts via a chat input box. It appends the message to the history, displays it, and then generates a response.

If a vector store exists (i.e., a PDF is uploaded), it uses RetrievalQA to fetch relevant chunks and augment the model’s response. Otherwise, it falls back to a basic chat. Responses are streamed for a natural feel, with error handling to catch any issues.

def handle_user_input(chat_model, retriever):
    if prompt := st.chat_input("Ask something about your PDF"):
        st.session_state.messages.append(HumanMessage(content=prompt))
        with st.chat_message("user"):
            st.write(prompt)

        with st.chat_message("assistant"):
            message_placeholder = st.empty()
            full_response = ""
            try:
                # Use the RetrievalQA chain to get the response
                response = chat_model.run(prompt)
                full_response = response
                message_placeholder.markdown(full_response)
                st.session_state.messages.append(AIMessage(content=full_response))
            except Exception as e:
                message_placeholder.markdown("❌ An error occurred while generating the response.")
                st.error(f"Error: {e}")

6. The Main Function

main() ties it all together: configures the page, handles the sidebar, initializes the chat model and retriever, displays messages, and processes inputs. It ensures the session state is set up with a default system message if it’s the first run.

The app checks for a vector store to enable RAG; if present, it creates a RetrievalQA chain with “stuff” type (which stuffs retrieved docs into the prompt).

def main():
    configure_page()
    selected_model = handle_sidebar()
    chat_model = get_chat_model(selected_model)

    # Initialize the chat history in the session state if not already present
    if "messages" not in st.session_state:
        st.session_state.messages = [
            SystemMessage(content="You are a helpful AI assistant.")
        ]

    # Initialize vector store if PDF is uploaded
    if "vector_store" in st.session_state:
        retriever = st.session_state.vector_store.as_retriever()
        qa_chain = RetrievalQA.from_chain_type(
            llm=chat_model,
            chain_type="stuff",
            retriever=retriever,
            return_source_documents=False,
        )
        chat_model_with_retrieval = qa_chain
    else:
        chat_model_with_retrieval = chat_model

    display_chat_messages()
    # Assign retriever separately
    if "vector_store" in st.session_state:
        retriever_instance = st.session_state.vector_store.as_retriever()
    else:
        retriever_instance = None

    handle_user_input(chat_model_with_retrieval, retriever=retriever_instance)

if __name__ == "__main__":
    main()

Running the App

Save the code as app.py. In your terminal, navigate to the directory and run:

streamlit run app.py

This launches a local web server (usually at http://localhost:8501). Open it in your browser, select a model in the sidebar, upload a PDF, and start chatting! For example, upload a PDF about machine learning and ask, “What is gradient descent?”

If you’re using local models, ensure Ollama is running in the background. For production, deploy to Streamlit Cloud or a server.

Potential Improvements and Tips

Customization: Add more models or tweak chunk sizes for better retrieval accuracy.
Error Handling: The code already catches exceptions, but you could expand it to retry failed queries.
Performance: For large PDFs, consider asynchronous processing or a more robust database setup.
Security: Since this uses temp files, be mindful of sensitive data in production.

This app demonstrates the power of RAG in a simple package—perfect for personal projects or prototyping enterprise tools.

This content originally appeared on DEV Community and was authored by James