This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)
Ever wanted to build an AI assistant that actually knows about your specific data? Today, we’ll walk through creating a powerful RAG (Retrieval-Augmented Generation) system that combines Gaia Node’s decentralized AI with Jina’s state-of-the-art embeddings to build an intelligent companion that can answer questions about your personal knowledge base.
What We’re Building
Our AI companion will:
- Convert your text data into high-quality vector embeddings using Jina AI
- Store these embeddings locally in a Qdrant vector database
- Use natural language to search through your knowledge base
- Generate contextual responses via a Gaia Node
Why This Stack?
- Gaia Node: Decentralized, privacy-focused AI inference
- Jina Embeddings v4: Superior multilingual embeddings with late chunking
- Qdrant: Fast, local vector database
- Complete Privacy: Everything runs locally except embedding generation
Prerequisites
pip install qdrant-client requests openai
You’ll also need:
- A running Qdrant instance (local or Docker)
- Access to a Gaia Node. Run your own by following this tutorial
- Jina AI API key (free tier available). Get one here.
Start Qdrant locally:
docker run -p 6333:6333 qdrant/qdrant
Step 1: Prepare Your Data
First, organize your data in a simple JSON format:
[
{"text": "Your first piece of knowledge"},
{"text": "Another important fact"},
{"text": "More information about your domain"}
]
Save this as your_data.json
.
Step 2: Generate Embeddings with Jina
Here’s our embedding pipeline that handles Jina’s batch limits and stores everything with the original text:
import json
import requests
import time
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid
from typing import List, Dict, Any
class JinaQdrantEmbedder:
def __init__(self, jina_api_key: str, qdrant_host: str = "localhost"):
self.jina_api_key = jina_api_key
self.jina_url = 'https://api.jina.ai/v1/embeddings'
self.qdrant_client = QdrantClient(host=qdrant_host, port=6333)
self.headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {jina_api_key}'
}
def load_json_data(self, file_path: str) -> List[Dict[str, str]]:
"""Load data from JSON file"""
with open(file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
print(f"✓ Loaded {len(data)} items from {file_path}")
return data
def create_embeddings_batch(self, batch_data: List[Dict[str, str]], batch_num: int):
"""Create embeddings for a batch using Jina API with late chunking"""
jina_input = [{"text": item['text']} for item in batch_data]
data = {
"model": "jina-embeddings-v4",
"task": "text-matching",
"late_chunking": True, # This is the magic sauce!
"input": jina_input
}
response = requests.post(self.jina_url, headers=self.headers, json=data)
if response.status_code == 200:
result = response.json()
embeddings = result.get('data', [])
combined_results = []
for i, (original_item, embedding_data) in enumerate(zip(batch_data, embeddings)):
combined_results.append({
'embedding': embedding_data['embedding'],
'original_data': original_item,
'global_index': len(combined_results)
})
print(f" ✓ Generated {len(combined_results)} embeddings for batch {batch_num}")
return combined_results
else:
print(f" Error {response.status_code}: {response.text}")
return []
def store_in_qdrant(self, batch_results: List[Dict], collection_name: str, global_offset: int):
"""Store embeddings with original text in Qdrant"""
points = []
for i, item in enumerate(batch_results):
payload = {
'text': item['original_data']['text'],
'global_index': global_offset + i,
'type': 'text',
'source': 'user_data'
}
point = PointStruct(
id=str(uuid.uuid4()),
vector=item['embedding'],
payload=payload
)
points.append(point)
self.qdrant_client.upsert(collection_name=collection_name, points=points)
print(f" ✓ Stored {len(points)} points in Qdrant")
def embed_and_store(self, json_file_path: str, collection_name: str = "my_knowledge_base"):
"""Complete pipeline: JSON → Embeddings → Qdrant"""
print("🚀 Starting embedding pipeline...")
# Load data
data = self.load_json_data(json_file_path)
batch_size = 512 # Jina's limit
total_batches = (len(data) + batch_size - 1) // batch_size
# Process first batch to get vector dimensions
first_batch = data[:min(batch_size, len(data))]
first_results = self.create_embeddings_batch(first_batch, 1)
if not first_results:
print("❌ Failed to process first batch!")
return
# Create Qdrant collection
vector_size = len(first_results[0]['embedding'])
try:
self.qdrant_client.delete_collection(collection_name)
except:
pass
self.qdrant_client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE)
)
# Store first batch
self.store_in_qdrant(first_results, collection_name, 0)
processed_items = len(first_batch)
# Process remaining batches
for batch_num in range(2, total_batches + 1):
start_idx = (batch_num - 1) * batch_size
end_idx = min(start_idx + batch_size, len(data))
batch_data = data[start_idx:end_idx]
print(f"Processing batch {batch_num}/{total_batches}...")
time.sleep(1) # Rate limiting
batch_results = self.create_embeddings_batch(batch_data, batch_num)
if batch_results:
self.store_in_qdrant(batch_results, collection_name, start_idx)
processed_items += len(batch_results)
print(f"🎉 Success! Processed {processed_items} items into '{collection_name}'")
# Usage
embedder = JinaQdrantEmbedder(jina_api_key="your_jina_api_key")
embedder.embed_and_store("your_data.json")
Step 3: Build the RAG System
Now let’s create the retrieval system that connects everything:
import openai
from openai import OpenAI
class GaiaQdrantRAG:
def __init__(self, gaia_base_url: str, jina_api_key: str,
collection_name: str = "my_knowledge_base"):
# Initialize Gaia Node client
self.gaia_client = OpenAI(
base_url=gaia_base_url,
api_key="gaia" # Most Gaia nodes don't require real API keys
)
# Initialize Qdrant client
self.qdrant_client = QdrantClient(host="localhost", port=6333)
self.collection_name = collection_name
# Jina setup for query embeddings
self.jina_api_key = jina_api_key
self.jina_url = 'https://api.jina.ai/v1/embeddings'
def generate_query_embedding(self, query: str) -> List[float]:
"""Convert user question to embedding using same Jina model"""
headers = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {self.jina_api_key}'
}
data = {
"model": "jina-embeddings-v4",
"task": "text-matching",
"input": [{"text": query}]
}
response = requests.post(self.jina_url, headers=headers, json=data)
result = response.json()
return result['data'][0]['embedding']
def search_knowledge_base(self, query_embedding: List[float], top_k: int = 3):
"""Find most relevant content from knowledge base"""
search_results = self.qdrant_client.search(
collection_name=self.collection_name,
query_vector=query_embedding,
limit=top_k,
score_threshold=0.6,
with_payload=True
)
return [
{
'text': result.payload['text'],
'score': result.score
}
for result in search_results
]
def generate_response(self, user_query: str, context_results: List[Dict]):
"""Generate response using Gaia Node with retrieved context"""
# Format context from search results
context = "\n".join([
f"[Source {i+1}] {result['text']}"
for i, result in enumerate(context_results)
])
# Create prompt for Gaia Node
system_prompt = """You are a helpful AI assistant. Use the provided context to answer the user's question accurately. If the context doesn't contain relevant information, say so clearly."""
user_prompt = f"""Context from knowledge base:
{context}
User Question: {user_query}
Please provide a helpful answer based on the context above."""
# Query Gaia Node
response = self.gaia_client.chat.completions.create(
model="gpt-3.5-turbo", # Use whatever model your Gaia node provides
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
max_tokens=500,
temperature=0.7
)
return response.choices[0].message.content
def ask(self, query: str) -> str:
"""Complete RAG pipeline: question → embedding → search → generate"""
print(f"🔍 Processing: {query}")
# Step 1: Convert question to embedding
query_embedding = self.generate_query_embedding(query)
# Step 2: Search knowledge base
relevant_content = self.search_knowledge_base(query_embedding)
if not relevant_content:
return "I couldn't find relevant information in the knowledge base."
# Step 3: Generate response with Gaia Node
response = self.generate_response(query, relevant_content)
return response
# Usage
rag = GaiaQdrantRAG(
gaia_base_url="https://your-gaia-node-url/v1",
jina_api_key="your_jina_api_key"
)
# Ask questions naturally!
answer = rag.ask("What do you know about machine learning?")
print(answer)
Step 4: Interactive AI Companion
Let’s create a simple chat interface:
def main():
"""Interactive chat with your AI companion"""
rag = GaiaQdrantRAG(
gaia_base_url="https://your-gaia-node-url/v1",
jina_api_key="your_jina_api_key"
)
print("🤖 AI Companion Ready! (Type 'quit' to exit)")
print("Ask me anything about your knowledge base...\n")
while True:
user_input = input("You: ").strip()
if user_input.lower() in ['quit', 'exit', 'q']:
print("Goodbye! 👋")
break
if not user_input:
continue
try:
response = rag.ask(user_input)
print(f"🤖 Assistant: {response}\n")
except Exception as e:
print(f"❌ Error: {str(e)}\n")
if __name__ == "__main__":
main()
Why This Stack Rocks
Jina Embeddings v4 with Late Chunking provides:
- Superior multilingual understanding
- Better semantic search quality
- Efficient processing of long documents
Gaia Node offers:
- Decentralized AI inference
- Privacy-focused processing
- No vendor lock-in
Local Qdrant ensures:
- Fast vector searches
- Complete data privacy
- No external dependencies for retrieval
Example Interaction
You: What are the main benefits of renewable energy?
🤖 Assistant: Based on your knowledge base, renewable energy offers several key benefits:
1. Environmental Impact: Significantly reduces carbon emissions and helps combat climate change
2. Economic Advantages: Creates jobs and reduces long-term energy costs
3. Energy Independence: Reduces reliance on fossil fuel imports
4. Sustainability: Provides an inexhaustible energy source for future generations
The context shows that solar and wind technologies have become increasingly cost-competitive with traditional energy sources.
Performance Tips
- Batch Size: Keep batches at 512 items for Jina API efficiency
- Vector Dimensions: Jina v4 uses 2048 dimensions – very information-rich
- Search Threshold: Start with 0.6 similarity threshold, adjust based on your data
- Late Chunking: Always enable this for better semantic understanding
Next Steps
[ ] Add document parsing (PDFs, Word docs)
[ ] Implement conversation memory
[ ] Create a web interface with FastAPI
[ ] Add real-time data updates
[ ] Integrate with more Gaia nodes for redundancy
You now have a powerful, privacy-focused AI companion that can understand and reason about your specific knowledge base. The combination of Jina’s advanced embeddings with Gaia’s decentralized inference creates a system that’s both intelligent and respects your data privacy.
The best part? Everything runs locally except for the initial embedding generation, giving you complete control over your AI assistant.
Ready to build your own AI companion?
Start with a small dataset, get the pipeline working, then scale up with your full knowledge base.
The future of personal AI is decentralized, and you just built it!
This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)