🎓 LLM Zoomcamp Module 2 – Chapter 2: Practical Implementation & Advanced Techniques – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Abdelrahman Adnan

Hands-On Learning: Building on Chapter 1’s foundations, this chapter dives deep into practical implementations. You’ll learn to build production-ready vector search systems using Elasticsearch, evaluate performance, and apply advanced optimization techniques.

Hands-On Implementation with Elasticsearch
Evaluating Vector Search Performance
Best Practices and Advanced Techniques
Conclusion and Next Steps

Hands-On Implementation with Elasticsearch

Setting Up Elasticsearch for Vector Search

Step 1: Start Elasticsearch with Docker

docker run -it \
    --rm \
    --name elasticsearch \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.4.3

Step 2: Install Required Libraries

pip install elasticsearch sentence-transformers pandas numpy

Complete Implementation

Step 1: Prepare Your Data

import json
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer

# Load your documents
with open('documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

documents = []
for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

Step 2: Generate Embeddings

# Initialize the embedding model
model = SentenceTransformer("all-mpnet-base-v2")

# Generate embeddings for each document
for doc in documents:
    # Create embedding from the text field
    doc["text_vector"] = model.encode(doc["text"]).tolist()

Step 3: Create Elasticsearch Index

# Connect to Elasticsearch
es_client = Elasticsearch('http://localhost:9200')

index_name = "course-questions"

# Define index settings and mappings
index_settings = {
    "settings": {
        "number_of_shards": 1,     # Number of primary shards
        "number_of_replicas": 0    # Number of replica shards
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},        # Main text content
            "section": {"type": "text"},    # Section information
            "question": {"type": "text"},   # Questions
            "course": {"type": "keyword"},  # Course identifier (exact match)
            "text_vector": {                # Vector field
                "type": "dense_vector",
                "dims": 768,                # Must match your model's output
                "index": True,              # Enable indexing
                "similarity": "cosine"      # Similarity metric
            }
        }
    }
}

# Create the index
es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)

Step 4: Index Documents

# Add documents to the index
for doc in documents:
    try:
        es_client.index(index=index_name, document=doc)
    except Exception as e:
        print(f"Error indexing document: {e}")

print(f"Indexed {len(documents)} documents")

Step 5: Perform Vector Search

def vector_search(query_text, top_k=5):
    """
    Perform vector search on Elasticsearch
    """
    # Encode the query
    query_vector = model.encode(query_text)

    # Define k-NN query
    knn_query = {
        "field": "text_vector",
        "query_vector": query_vector,
        "k": top_k,
        "num_candidates": 10000  # Number of candidates to consider
    }

    # Execute search
    response = es_client.search(
        index=index_name,
        knn=knn_query,
        source=["text", "section", "question", "course"]
    )

    return response["hits"]["hits"]

# Example search
query = "How do I install Python packages?"
results = vector_search(query)

for i, result in enumerate(results):
    print(f"Result {i+1}:")
    print(f"Score: {result['_score']:.4f}")
    print(f"Text: {result['_source']['text']}")
    print(f"Course: {result['_source']['course']}")
    print("-" * 50)

Step 6: Combine with Keyword Search (Hybrid)

def hybrid_search(query_text, top_k=5):
    """
    Combine vector and keyword search
    """
    query_vector = model.encode(query_text)

    search_query = {
        "query": {
            "bool": {
                "should": [
                    # Keyword search component
                    {
                        "multi_match": {
                            "query": query_text,
                            "fields": ["text", "section", "question"],
                            "boost": 1.0
                        }
                    }
                ]
            }
        },
        "knn": {
            "field": "text_vector",
            "query_vector": query_vector,
            "k": top_k,
            "num_candidates": 10000,
            "boost": 1.0
        }
    }

    response = es_client.search(
        index=index_name,
        body=search_query,
        size=top_k
    )

    return response["hits"]["hits"]

Evaluating Vector Search Performance

Why Evaluation Matters

When building a search system, you need to measure how well it works. Different embedding models, search parameters, and techniques can dramatically affect results.

Key Metrics

1⃣ Mean Reciprocal Rank (MRR)

What it measures: How high the first relevant result appears on average
Formula: MRR = (1/|Q|) × Σ(1/rank_i)
Range: 0 to 1 (higher is better)

Example:

Query 1: Relevant result at position 1 → 1/1 = 1.0
Query 2: Relevant result at position 3 → 1/3 = 0.33
Query 3: Relevant result at position 2 → 1/2 = 0.5
MRR = (1.0 + 0.33 + 0.5) / 3 = 0.61

2⃣ Hit Rate @ K (Recall @ K)

What it measures: Percentage of queries that have at least one relevant result in top K
Formula: HR@k = (Number of queries with relevant results in top k) / Total queries
Range: 0 to 1 (higher is better)

Example:

100 queries total
85 queries have relevant results in top 5
Hit Rate @ 5 = 85/100 = 0.85

Creating Ground Truth Data

To evaluate your system, you need ground truth – known correct answers for test queries.

Method 1: Manual Creation

ground_truth = [
    {
        "question": "How do I install Python?",
        "expected_doc_id": "python_installation_guide",
        "course": "data-engineering"
    },
    {
        "question": "What is a data pipeline?",
        "expected_doc_id": "pipeline_basics",
        "course": "data-engineering"
    }
    # ... more examples
]

Method 2: LLM-Generated Questions

def generate_questions_for_document(doc_text, num_questions=5):
    """
    Use an LLM to generate questions that this document should answer
    """
    prompt = f"""
    Based on the following document, generate {num_questions} questions that this document would answer well. 
    Make the questions natural and varied - don't just copy words from the document.

    Document: {doc_text}

    Questions:
    """

    # Call your LLM here (OpenAI, Anthropic, etc.)
    questions = call_llm(prompt)
    return questions

# Generate ground truth
ground_truth = []
for doc in documents:
    questions = generate_questions_for_document(doc['text'])
    for q in questions:
        ground_truth.append({
            "question": q,
            "expected_doc_id": doc['id'],
            "course": doc['course']
        })

Evaluation Implementation

def evaluate_search_system(search_function, ground_truth_data, top_k=5):
    """
    Evaluate a search system using MRR and Hit Rate
    """
    relevance_scores = []

    for item in ground_truth_data:
        query = item["question"]
        expected_id = item["expected_doc_id"]

        # Get search results
        results = search_function(query, top_k)

        # Check if expected document is in results
        relevance = []
        for i, result in enumerate(results):
            is_relevant = result["_source"]["id"] == expected_id
            relevance.append(is_relevant)

        relevance_scores.append(relevance)

    # Calculate metrics
    mrr = calculate_mrr(relevance_scores)
    hit_rate = calculate_hit_rate(relevance_scores)

    return {
        "MRR": mrr,
        "Hit_Rate": hit_rate,
        "num_queries": len(ground_truth_data)
    }

def calculate_mrr(relevance_scores):
    """Calculate Mean Reciprocal Rank"""
    total_reciprocal_rank = 0

    for relevance in relevance_scores:
        for i, is_relevant in enumerate(relevance):
            if is_relevant:
                total_reciprocal_rank += 1 / (i + 1)  # +1 because rank starts at 1
                break

    return total_reciprocal_rank / len(relevance_scores)

def calculate_hit_rate(relevance_scores):
    """Calculate Hit Rate (any relevant result in top K)"""
    hits = 0

    for relevance in relevance_scores:
        if any(relevance):  # If any result is relevant
            hits += 1

    return hits / len(relevance_scores)

# Example evaluation
results = evaluate_search_system(vector_search, ground_truth)
print(f"📊 MRR: {results['MRR']:.3f}")
print(f"🎯 Hit Rate: {results['Hit_Rate']:.3f}")

Comparing Different Approaches

# Test different embedding models
models_to_test = [
    "all-mpnet-base-v2",
    "all-MiniLM-L6-v2", 
    "sentence-transformers/all-roberta-large-v1"
]

results = {}
for model_name in models_to_test:
    print(f"🧪 Testing {model_name}...")

    # Recreate index with new model
    model = SentenceTransformer(model_name)
    # ... reindex documents with new embeddings ...

    # Evaluate
    metrics = evaluate_search_system(vector_search, ground_truth)
    results[model_name] = metrics

# Compare results
for model, metrics in results.items():
    print(f"🏆 {model}: MRR={metrics['MRR']:.3f}, HR={metrics['Hit_Rate']:.3f}")

Best Practices and Advanced Techniques

1⃣ Choosing the Right Embedding Model

Factors to Consider:

Domain: Use domain-specific models when available (bio, legal, etc.)
Language: Multilingual models for non-English content
Performance: Balance accuracy vs. speed/size requirements
Input Length: Some models handle longer texts better

Popular Models by Use Case:

# 🌟 General purpose (good starting point)
"sentence-transformers/all-mpnet-base-v2"  # 768 dim, high quality

# ⚡ Fast and lightweight
"sentence-transformers/all-MiniLM-L6-v2"   # 384 dim, 5x faster

# 🌍 Multilingual
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2"

# 💻 Code search
"microsoft/codebert-base"

# 📚 Long documents
"sentence-transformers/all-mpnet-base-v2"  # handles up to 512 tokens well

2⃣ Optimizing Vector Databases

Index Configuration:

# FAISS optimization example
import faiss

# For CPU
index = faiss.IndexFlatIP(dimension)  # Inner product (fast exact search)

# For GPU (much faster)
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, index)

# Approximate search for large datasets
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index.train(training_vectors)
index.add(vectors)
index.nprobe = 10  # Search parameter

3⃣ Handling Large Datasets

Chunking Strategy:

def chunk_document(text, max_chunk_size=500, overlap=50):
    """
    Split long documents into overlapping chunks
    """
    words = text.split()
    chunks = []

    for i in range(0, len(words), max_chunk_size - overlap):
        chunk = ' '.join(words[i:i + max_chunk_size])
        chunks.append(chunk)

    return chunks

# Process long documents
processed_docs = []
for doc in documents:
    if len(doc['text'].split()) > 500:  # Long document
        chunks = chunk_document(doc['text'])
        for i, chunk in enumerate(chunks):
            chunk_doc = doc.copy()
            chunk_doc['text'] = chunk
            chunk_doc['chunk_id'] = i
            processed_docs.append(chunk_doc)
    else:
        processed_docs.append(doc)

Batch Processing:

def process_embeddings_in_batches(texts, model, batch_size=32):
    """
    Process embeddings in batches to avoid memory issues
    """
    embeddings = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        batch_embeddings = model.encode(batch)
        embeddings.extend(batch_embeddings)

    return embeddings

4⃣ Query Enhancement

Query Expansion:

def expand_query(original_query, expansion_terms=3):
    """
    Add related terms to improve search coverage
    """
    # Use word embeddings to find similar terms
    similar_words = find_similar_words(original_query, n=expansion_terms)
    expanded_query = original_query + " " + " ".join(similar_words)
    return expanded_query

Multi-Vector Search:

def multi_vector_search(query, models, weights=None):
    """
    Combine results from multiple embedding models
    """
    if weights is None:
        weights = [1.0 / len(models)] * len(models)

    all_results = []
    for model, weight in zip(models, weights):
        results = vector_search_with_model(query, model)
        weighted_results = [(r, r['_score'] * weight) for r in results]
        all_results.extend(weighted_results)

    # Combine and re-rank
    combined_results = combine_and_rerank(all_results)
    return combined_results

5⃣ Monitoring and Debugging

Search Analytics:

from datetime import datetime

def log_search_analytics(query, results, user_id=None):
    """
    Log search queries and results for analysis
    """
    analytics_data = {
        "timestamp": datetime.now(),
        "query": query,
        "user_id": user_id,
        "num_results": len(results),
        "top_score": results[0]['_score'] if results else 0,
        "result_ids": [r['_source']['id'] for r in results[:5]]
    }

    # Save to analytics database
    save_analytics(analytics_data)

def analyze_search_patterns():
    """
    Analyze common queries and failure patterns
    """
    # Common queries without good results
    low_score_queries = get_queries_with_low_scores()

    # Queries with no clicks
    no_click_queries = get_queries_without_clicks()

    return {
        "improvement_opportunities": low_score_queries,
        "potential_gaps": no_click_queries
    }

6⃣ A/B Testing Search Systems

def ab_test_search_systems(system_a, system_b, test_queries, metric="mrr"):
    """
    Compare two search systems
    """
    results_a = evaluate_search_system(system_a, test_queries)
    results_b = evaluate_search_system(system_b, test_queries)

    improvement = (results_b[metric] - results_a[metric]) / results_a[metric] * 100

    return {
        "system_a": results_a,
        "system_b": results_b,
        "improvement_percent": improvement,
        "winner": "B" if results_b[metric] > results_a[metric] else "A"
    }

Conclusion and Next Steps

What You’ve Learned

In this comprehensive guide, you’ve learned:

Fundamentals: What vector search is and why it’s powerful
Vector Representations: From one-hot encoding to dense embeddings
Search Techniques: Similarity metrics, hybrid search, and ANN algorithms
Vector Databases: How to choose and use specialized databases
Implementation: Hands-on setup with Elasticsearch
Evaluation: How to measure and improve search performance
Best Practices: Optimization techniques and production considerations

Key Takeaways

Vector search enables semantic understanding – finding meaning, not just keywords
Embeddings capture relationships – similar items have similar vectors

Hybrid search combines the best of both worlds – semantic + keyword matching
Evaluation is crucial – always measure performance with proper metrics
Choose the right tools – different databases and models for different needs

Next Steps for Your Journey

Immediate Actions:

Practice with Real Data: Try the code examples with your own dataset
Experiment with Models: Test different embedding models for your use case
Build a Simple Project: Create a search system for a specific domain
Join Communities: Participate in vector search and LLM communities

Advanced Topics to Explore:

Multi-Modal Search: Combining text, image, and audio search
Real-Time Updates: Handling dynamic document collections
Federated Search: Searching across multiple vector databases
Custom Embeddings: Training domain-specific embedding models
Production Deployment: Scaling vector search for millions of users

Recommended Resources:

Papers: “Attention Is All You Need”, “BERT”, “Sentence-BERT”
Courses: Deep Learning Specialization, NLP courses
Tools: Hugging Face, LangChain, Vector database documentation
Communities: Reddit r/MachineLearning, Discord servers, GitHub discussions

Final Thoughts

Vector search is transforming how we find and interact with information. As LLMs and AI applications continue to grow, understanding vector search becomes increasingly valuable. The concepts you’ve learned here form the foundation for building intelligent search systems, recommendation engines, and AI applications.

Remember: Start simple, measure everything, and iterate based on real user needs. The best search system is one that actually helps users find what they’re looking for quickly and accurately.

Happy searching!

Additional Resources

Tools and Libraries

Embedding Models: Sentence Transformers, OpenAI, Cohere
Vector Databases: Pinecone, Weaviate, Milvus, Chroma
Evaluation: BEIR benchmark, custom evaluation frameworks
Production: Kubernetes deployments, monitoring tools