Building Your Own AI Companion: Combining Gaia Node with Jina Embeddings v4 with Late Chunking



This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)

Ever wanted to build an AI assistant that actually knows about your specific data? Today, we’ll walk through creating a powerful RAG (Retrieval-Augmented Generation) system that combines Gaia Node’s decentralized AI with Jina’s state-of-the-art embeddings to build an intelligent companion that can answer questions about your personal knowledge base.

What We’re Building

Our AI companion will:

  • Convert your text data into high-quality vector embeddings using Jina AI
  • Store these embeddings locally in a Qdrant vector database
  • Use natural language to search through your knowledge base
  • Generate contextual responses via a Gaia Node

Why This Stack?

  • Gaia Node: Decentralized, privacy-focused AI inference
  • Jina Embeddings v4: Superior multilingual embeddings with late chunking
  • Qdrant: Fast, local vector database
  • Complete Privacy: Everything runs locally except embedding generation

Prerequisites

pip install qdrant-client requests openai

You’ll also need:

  • A running Qdrant instance (local or Docker)
  • Access to a Gaia Node. Run your own by following this tutorial
  • Jina AI API key (free tier available). Get one here.

Start Qdrant locally:

docker run -p 6333:6333 qdrant/qdrant

Step 1: Prepare Your Data

First, organize your data in a simple JSON format:

[
  {"text": "Your first piece of knowledge"},
  {"text": "Another important fact"},
  {"text": "More information about your domain"}
]

Save this as your_data.json.

Step 2: Generate Embeddings with Jina

Here’s our embedding pipeline that handles Jina’s batch limits and stores everything with the original text:

import json
import requests
import time
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import uuid
from typing import List, Dict, Any

class JinaQdrantEmbedder:
    def __init__(self, jina_api_key: str, qdrant_host: str = "localhost"):
        self.jina_api_key = jina_api_key
        self.jina_url = 'https://api.jina.ai/v1/embeddings'
        self.qdrant_client = QdrantClient(host=qdrant_host, port=6333)

        self.headers = {
            'Content-Type': 'application/json',
            'Authorization': f'Bearer {jina_api_key}'
        }

    def load_json_data(self, file_path: str) -> List[Dict[str, str]]:
        """Load data from JSON file"""
        with open(file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)
        print(f"✓ Loaded {len(data)} items from {file_path}")
        return data

    def create_embeddings_batch(self, batch_data: List[Dict[str, str]], batch_num: int):
        """Create embeddings for a batch using Jina API with late chunking"""
        jina_input = [{"text": item['text']} for item in batch_data]

        data = {
            "model": "jina-embeddings-v4",
            "task": "text-matching",
            "late_chunking": True,  # This is the magic sauce!
            "input": jina_input
        }

        response = requests.post(self.jina_url, headers=self.headers, json=data)

        if response.status_code == 200:
            result = response.json()
            embeddings = result.get('data', [])

            combined_results = []
            for i, (original_item, embedding_data) in enumerate(zip(batch_data, embeddings)):
                combined_results.append({
                    'embedding': embedding_data['embedding'],
                    'original_data': original_item,
                    'global_index': len(combined_results)
                })

            print(f"  ✓ Generated {len(combined_results)} embeddings for batch {batch_num}")
            return combined_results
        else:
            print(f"  Error {response.status_code}: {response.text}")
            return []

    def store_in_qdrant(self, batch_results: List[Dict], collection_name: str, global_offset: int):
        """Store embeddings with original text in Qdrant"""
        points = []

        for i, item in enumerate(batch_results):
            payload = {
                'text': item['original_data']['text'],
                'global_index': global_offset + i,
                'type': 'text',
                'source': 'user_data'
            }

            point = PointStruct(
                id=str(uuid.uuid4()),
                vector=item['embedding'],
                payload=payload
            )
            points.append(point)

        self.qdrant_client.upsert(collection_name=collection_name, points=points)
        print(f"  ✓ Stored {len(points)} points in Qdrant")

    def embed_and_store(self, json_file_path: str, collection_name: str = "my_knowledge_base"):
        """Complete pipeline: JSON → Embeddings → Qdrant"""
        print("🚀 Starting embedding pipeline...")

        # Load data
        data = self.load_json_data(json_file_path)
        batch_size = 512  # Jina's limit
        total_batches = (len(data) + batch_size - 1) // batch_size

        # Process first batch to get vector dimensions
        first_batch = data[:min(batch_size, len(data))]
        first_results = self.create_embeddings_batch(first_batch, 1)

        if not first_results:
            print("❌ Failed to process first batch!")
            return

        # Create Qdrant collection
        vector_size = len(first_results[0]['embedding'])
        try:
            self.qdrant_client.delete_collection(collection_name)
        except:
            pass

        self.qdrant_client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=vector_size, distance=Distance.COSINE)
        )

        # Store first batch
        self.store_in_qdrant(first_results, collection_name, 0)
        processed_items = len(first_batch)

        # Process remaining batches
        for batch_num in range(2, total_batches + 1):
            start_idx = (batch_num - 1) * batch_size
            end_idx = min(start_idx + batch_size, len(data))
            batch_data = data[start_idx:end_idx]

            print(f"Processing batch {batch_num}/{total_batches}...")
            time.sleep(1)  # Rate limiting

            batch_results = self.create_embeddings_batch(batch_data, batch_num)
            if batch_results:
                self.store_in_qdrant(batch_results, collection_name, start_idx)
                processed_items += len(batch_results)

        print(f"🎉 Success! Processed {processed_items} items into '{collection_name}'")

# Usage
embedder = JinaQdrantEmbedder(jina_api_key="your_jina_api_key")
embedder.embed_and_store("your_data.json")

Step 3: Build the RAG System

Now let’s create the retrieval system that connects everything:

import openai
from openai import OpenAI

class GaiaQdrantRAG:
    def __init__(self, gaia_base_url: str, jina_api_key: str, 
                 collection_name: str = "my_knowledge_base"):

        # Initialize Gaia Node client
        self.gaia_client = OpenAI(
            base_url=gaia_base_url,
            api_key="gaia"  # Most Gaia nodes don't require real API keys
        )

        # Initialize Qdrant client
        self.qdrant_client = QdrantClient(host="localhost", port=6333)
        self.collection_name = collection_name

        # Jina setup for query embeddings
        self.jina_api_key = jina_api_key
        self.jina_url = 'https://api.jina.ai/v1/embeddings'

    def generate_query_embedding(self, query: str) -> List[float]:
        """Convert user question to embedding using same Jina model"""
        headers = {
            'Content-Type': 'application/json',
            'Authorization': f'Bearer {self.jina_api_key}'
        }

        data = {
            "model": "jina-embeddings-v4",
            "task": "text-matching",
            "input": [{"text": query}]
        }

        response = requests.post(self.jina_url, headers=headers, json=data)
        result = response.json()
        return result['data'][0]['embedding']

    def search_knowledge_base(self, query_embedding: List[float], top_k: int = 3):
        """Find most relevant content from knowledge base"""
        search_results = self.qdrant_client.search(
            collection_name=self.collection_name,
            query_vector=query_embedding,
            limit=top_k,
            score_threshold=0.6,
            with_payload=True
        )

        return [
            {
                'text': result.payload['text'],
                'score': result.score
            }
            for result in search_results
        ]

    def generate_response(self, user_query: str, context_results: List[Dict]):
        """Generate response using Gaia Node with retrieved context"""

        # Format context from search results
        context = "\n".join([
            f"[Source {i+1}] {result['text']}" 
            for i, result in enumerate(context_results)
        ])

        # Create prompt for Gaia Node
        system_prompt = """You are a helpful AI assistant. Use the provided context to answer the user's question accurately. If the context doesn't contain relevant information, say so clearly."""

        user_prompt = f"""Context from knowledge base:
{context}

User Question: {user_query}

Please provide a helpful answer based on the context above."""

        # Query Gaia Node
        response = self.gaia_client.chat.completions.create(
            model="gpt-3.5-turbo",  # Use whatever model your Gaia node provides
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            max_tokens=500,
            temperature=0.7
        )

        return response.choices[0].message.content

    def ask(self, query: str) -> str:
        """Complete RAG pipeline: question → embedding → search → generate"""
        print(f"🔍 Processing: {query}")

        # Step 1: Convert question to embedding
        query_embedding = self.generate_query_embedding(query)

        # Step 2: Search knowledge base
        relevant_content = self.search_knowledge_base(query_embedding)

        if not relevant_content:
            return "I couldn't find relevant information in the knowledge base."

        # Step 3: Generate response with Gaia Node
        response = self.generate_response(query, relevant_content)

        return response

# Usage
rag = GaiaQdrantRAG(
    gaia_base_url="https://your-gaia-node-url/v1",
    jina_api_key="your_jina_api_key"
)

# Ask questions naturally!
answer = rag.ask("What do you know about machine learning?")
print(answer)

Step 4: Interactive AI Companion

Let’s create a simple chat interface:

def main():
    """Interactive chat with your AI companion"""

    rag = GaiaQdrantRAG(
        gaia_base_url="https://your-gaia-node-url/v1",
        jina_api_key="your_jina_api_key"
    )

    print("🤖 AI Companion Ready! (Type 'quit' to exit)")
    print("Ask me anything about your knowledge base...\n")

    while True:
        user_input = input("You: ").strip()

        if user_input.lower() in ['quit', 'exit', 'q']:
            print("Goodbye! 👋")
            break

        if not user_input:
            continue

        try:
            response = rag.ask(user_input)
            print(f"🤖 Assistant: {response}\n")
        except Exception as e:
            print(f"❌ Error: {str(e)}\n")

if __name__ == "__main__":
    main()

Why This Stack Rocks

Jina Embeddings v4 with Late Chunking provides:

  • Superior multilingual understanding
  • Better semantic search quality
  • Efficient processing of long documents

Gaia Node offers:

  • Decentralized AI inference
  • Privacy-focused processing
  • No vendor lock-in

Local Qdrant ensures:

  • Fast vector searches
  • Complete data privacy
  • No external dependencies for retrieval

Example Interaction

You: What are the main benefits of renewable energy?

🤖 Assistant: Based on your knowledge base, renewable energy offers several key benefits:

1. Environmental Impact: Significantly reduces carbon emissions and helps combat climate change
2. Economic Advantages: Creates jobs and reduces long-term energy costs
3. Energy Independence: Reduces reliance on fossil fuel imports
4. Sustainability: Provides an inexhaustible energy source for future generations

The context shows that solar and wind technologies have become increasingly cost-competitive with traditional energy sources.

Performance Tips

  1. Batch Size: Keep batches at 512 items for Jina API efficiency
  2. Vector Dimensions: Jina v4 uses 2048 dimensions – very information-rich
  3. Search Threshold: Start with 0.6 similarity threshold, adjust based on your data
  4. Late Chunking: Always enable this for better semantic understanding

Next Steps

[ ] Add document parsing (PDFs, Word docs)
[ ] Implement conversation memory
[ ] Create a web interface with FastAPI
[ ] Add real-time data updates
[ ] Integrate with more Gaia nodes for redundancy

You now have a powerful, privacy-focused AI companion that can understand and reason about your specific knowledge base. The combination of Jina’s advanced embeddings with Gaia’s decentralized inference creates a system that’s both intelligent and respects your data privacy.

The best part? Everything runs locally except for the initial embedding generation, giving you complete control over your AI assistant.

Ready to build your own AI companion?
Start with a small dataset, get the pipeline working, then scale up with your full knowledge base.

The future of personal AI is decentralized, and you just built it! 🚀


This content originally appeared on DEV Community and was authored by Harish Kotra (he/him)