Building A Local PDF RAG Chat System with DeepSeek-R1
A Fast, Secure, and Local AI-Powered PDF Assistant — No Cloud Required
Project Goal
Create a private, local RAG system that lets you chat with your PDFs using DeepSeek-R1 running on Ollama — keeping everything on your machine.
Why This Matters
DeepSeek models offer a compelling alternative to OpenAI and other cloud LLMs. While most RAG implementations rely on OpenAI’s API, we can build something just as powerful using DeepSeek’s models running locally.
What We’re Building
A streamlined Streamlit app that:
- Takes any PDF as input
- Processes it locally (no data leaves your machine)
- Lets you ask natural language questions about the content
- Maintains conversation context for follow-up questions
RAG Explained (Retrieval-Augmented Generation)
Instead of making the LLM hallucinate from memory, RAG provides relevant context from your documents right when it’s needed. This approach:
- Chunks your documents into manageable pieces
- Converts those chunks into vector embeddings
- When you ask a question, finds the most relevant chunks
- Passes only those relevant chunks to the LLM with your question
- Returns a targeted, accurate response based on your actual document content
Project Architecture
Dependencies
import os
import streamlit as st
from langchain_ollama import OllamaEmbeddings, OllamaLLM
from langchain_community.document_loaders import PDFPlumberLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.prompts import ChatPromptTemplate
Configuration
# Constants
PDF_DIRECTORY = "documents/pdfs/"
EMBEDDING_MODEL_NAME = "deepseek-r1:1.5b"
# Ensure directory exists
os.makedirs(PDF_DIRECTORY, exist_ok=True)
Initialize Core Components
# Initialize models
embeddings = OllamaEmbeddings(model=EMBEDDING_MODEL_NAME)
vector_db = InMemoryVectorStore(embeddings)
llm = OllamaLLM(model=EMBEDDING_MODEL_NAME)
Helper Functions
- File Handling
def save_uploaded_file(uploaded_file):
"""Saves an uploaded PDF file to the storage directory."""
file_path = os.path.join(PDF_DIRECTORY, uploaded_file.name)
with open(file_path, "wb") as f:
f.write(uploaded_file.getbuffer())
return file_path
2. Document Processing
def load_and_process_pdf(file_path):
"""Loads and processes a PDF into text chunks."""
loader = PDFPlumberLoader(file_path)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
return splitter.split_documents(docs)
3. Vector Operations
def index_chunks(chunks):
"""Indexes document chunks into the vector store."""
vector_db.add_documents(chunks)
def find_related_docs(query):
"""Finds the most relevant document chunks based on a query."""
return vector_db.similarity_search(query)
4. Generation
def generate_answer(user_query, related_docs, chat_history):
"""Generates an answer based on relevant documents and chat history."""
context = "\n\n".join([d.page_content for d in related_docs])
chat_prompt = ChatPromptTemplate.from_template(
"""
You are a smart assistant. Use the context below and chat history to answer the user's question.
Context: {context}
Chat History: {chat_history}
Question: {question}
Answer:
"""
)
chain = chat_prompt | llm
return chain.invoke({
"context": context,
"chat_history": "\n".join(chat_history),
"question": user_query
})
UI Implementation
# Streamlit UI
st.title("DeepSeek RAG Chat")
st.write("Ask questions about your PDF using DeepSeek-R1 and Ollama!")
# Initialize chat history
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
# File upload
document = st.file_uploader("Upload a PDF", type=["pdf"])
if document:
path_to_pdf = save_uploaded_file(document)
chunks = load_and_process_pdf(path_to_pdf)
index_chunks(chunks)
st.success("PDF indexed successfully! Type your questions below.")
user_query = st.text_input("Ask a question")
if user_query:
with st.spinner("Searching and generating answer..."):
related_docs = find_related_docs(user_query)
answer = generate_answer(user_query, related_docs, st.session_state.chat_history)
st.session_state.chat_history.extend([f"User: {user_query}", f"Assistant: {answer}"])
# Display chat history
for message in st.session_state.chat_history:
st.write(message)
else:
st.info("Please upload a PDF file to get started.")
Full Code
Code Link: Link
Setup Instructions
- Install Ollama
- Download from ollama.ai
2. Pull the DeepSeek model
ollama pull deepseek-r1:1.5b
3. Install dependencies
pip install streamlit langchain langchain-ollama langchain-community pdfplumber langchain-text-splitters
4. Run the app
streamlit run deepseek_chat.py
Troubleshooting
Missing packages: Verify you’ve run the complete pip install command
Model not found: Ensure you’ve run ollama pull deepseek-r1:1.5b
Poor answer quality: Try reducing chunk size to 500 or adjust the prompt template
Slow initial response: Normal - the first run loads the model into memory
Potential Enhancements
Working backwards from expanded goals, here are the logical next steps:
- Support for multiple documents — Allow comparing information across PDFs
- Persistent vector storage — Save vector DB between sessions using FAISS or Chroma
- Improved UI — Add message bubbles, markdown support, and code highlighting
- Image handling — Extract and reference images from PDFs in answers
- Model parameter tuning — Expose temperature and other settings for user tweaking
- TypeScript reimplementation — Port to a TypeScript/Next.js app with Tailwind &
shadcn/ui
Conclusion
You now have a fully functional PDF chat system that runs entirely on your machine. No data leaves your computer; you’re not dependent on external APIs. Give it a try with different documents to see how it performs.
DeepSeek-R1 provides a strong balance of performance and efficiency for local deployments, making this approach practical even on modest hardware.