๐ŸŽ“ LLM Zoomcamp Module 2 โ€“ Chapter 2: Practical Implementation & Advanced Techniques



This content originally appeared on DEV Community and was authored by Abdelrahman Adnan

๐Ÿ›  Hands-On Learning: Building on Chapter 1’s foundations, this chapter dives deep into practical implementations. You’ll learn to build production-ready vector search systems using Elasticsearch, evaluate performance, and apply advanced optimization techniques.

๐Ÿ“– Table of Contents

  1. โšก Hands-On Implementation with Elasticsearch
  2. ๐Ÿ“Š Evaluating Vector Search Performance
  3. ๐ŸŽฏ Best Practices and Advanced Techniques
  4. ๐Ÿš€ Conclusion and Next Steps

โšก Hands-On Implementation with Elasticsearch

๐Ÿณ Setting Up Elasticsearch for Vector Search

๐Ÿ”ง Step 1: Start Elasticsearch with Docker

docker run -it \
    --rm \
    --name elasticsearch \
    -p 9200:9200 \
    -p 9300:9300 \
    -e "discovery.type=single-node" \
    -e "xpack.security.enabled=false" \
    docker.elastic.co/elasticsearch/elasticsearch:8.4.3

๐Ÿ“ฆ Step 2: Install Required Libraries

pip install elasticsearch sentence-transformers pandas numpy

๐Ÿ’ป Complete Implementation

๐Ÿ“‚ Step 1: Prepare Your Data

import json
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer

# Load your documents
with open('documents.json', 'rt') as f_in:
    docs_raw = json.load(f_in)

documents = []
for course_dict in docs_raw:
    for doc in course_dict['documents']:
        doc['course'] = course_dict['course']
        documents.append(doc)

๐Ÿง  Step 2: Generate Embeddings

# Initialize the embedding model
model = SentenceTransformer("all-mpnet-base-v2")

# Generate embeddings for each document
for doc in documents:
    # Create embedding from the text field
    doc["text_vector"] = model.encode(doc["text"]).tolist()

๐Ÿ—‚ Step 3: Create Elasticsearch Index

# Connect to Elasticsearch
es_client = Elasticsearch('http://localhost:9200')

index_name = "course-questions"

# Define index settings and mappings
index_settings = {
    "settings": {
        "number_of_shards": 1,     # Number of primary shards
        "number_of_replicas": 0    # Number of replica shards
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},        # Main text content
            "section": {"type": "text"},    # Section information
            "question": {"type": "text"},   # Questions
            "course": {"type": "keyword"},  # Course identifier (exact match)
            "text_vector": {                # Vector field
                "type": "dense_vector",
                "dims": 768,                # Must match your model's output
                "index": True,              # Enable indexing
                "similarity": "cosine"      # Similarity metric
            }
        }
    }
}

# Create the index
es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)

๐Ÿ“ฅ Step 4: Index Documents

# Add documents to the index
for doc in documents:
    try:
        es_client.index(index=index_name, document=doc)
    except Exception as e:
        print(f"Error indexing document: {e}")

print(f"Indexed {len(documents)} documents")

๐Ÿ” Step 5: Perform Vector Search

def vector_search(query_text, top_k=5):
    """
    Perform vector search on Elasticsearch
    """
    # Encode the query
    query_vector = model.encode(query_text)

    # Define k-NN query
    knn_query = {
        "field": "text_vector",
        "query_vector": query_vector,
        "k": top_k,
        "num_candidates": 10000  # Number of candidates to consider
    }

    # Execute search
    response = es_client.search(
        index=index_name,
        knn=knn_query,
        source=["text", "section", "question", "course"]
    )

    return response["hits"]["hits"]

# Example search
query = "How do I install Python packages?"
results = vector_search(query)

for i, result in enumerate(results):
    print(f"Result {i+1}:")
    print(f"Score: {result['_score']:.4f}")
    print(f"Text: {result['_source']['text']}")
    print(f"Course: {result['_source']['course']}")
    print("-" * 50)

๐Ÿ”„ Step 6: Combine with Keyword Search (Hybrid)

def hybrid_search(query_text, top_k=5):
    """
    Combine vector and keyword search
    """
    query_vector = model.encode(query_text)

    search_query = {
        "query": {
            "bool": {
                "should": [
                    # Keyword search component
                    {
                        "multi_match": {
                            "query": query_text,
                            "fields": ["text", "section", "question"],
                            "boost": 1.0
                        }
                    }
                ]
            }
        },
        "knn": {
            "field": "text_vector",
            "query_vector": query_vector,
            "k": top_k,
            "num_candidates": 10000,
            "boost": 1.0
        }
    }

    response = es_client.search(
        index=index_name,
        body=search_query,
        size=top_k
    )

    return response["hits"]["hits"]

๐Ÿ“Š Evaluating Vector Search Performance

๐ŸŽฏ Why Evaluation Matters

When building a search system, you need to measure how well it works. Different embedding models, search parameters, and techniques can dramatically affect results.

๐Ÿ“ˆ Key Metrics

1⃣ Mean Reciprocal Rank (MRR)

๐Ÿ“Š What it measures: How high the first relevant result appears on average
๐Ÿงฎ Formula: MRR = (1/|Q|) ร— ฮฃ(1/rank_i)
๐Ÿ“ˆ Range: 0 to 1 (higher is better)

๐Ÿ“ Example:

  • Query 1: Relevant result at position 1 โ†’ 1/1 = 1.0
  • Query 2: Relevant result at position 3 โ†’ 1/3 = 0.33
  • Query 3: Relevant result at position 2 โ†’ 1/2 = 0.5
  • MRR = (1.0 + 0.33 + 0.5) / 3 = 0.61

2⃣ Hit Rate @ K (Recall @ K)

๐Ÿ“Š What it measures: Percentage of queries that have at least one relevant result in top K
๐Ÿงฎ Formula: HR@k = (Number of queries with relevant results in top k) / Total queries
๐Ÿ“ˆ Range: 0 to 1 (higher is better)

๐Ÿ“ Example:

  • 100 queries total
  • 85 queries have relevant results in top 5
  • Hit Rate @ 5 = 85/100 = 0.85

๐Ÿ— Creating Ground Truth Data

To evaluate your system, you need ground truth – known correct answers for test queries.

โœ Method 1: Manual Creation

ground_truth = [
    {
        "question": "How do I install Python?",
        "expected_doc_id": "python_installation_guide",
        "course": "data-engineering"
    },
    {
        "question": "What is a data pipeline?",
        "expected_doc_id": "pipeline_basics",
        "course": "data-engineering"
    }
    # ... more examples
]

๐Ÿค– Method 2: LLM-Generated Questions

def generate_questions_for_document(doc_text, num_questions=5):
    """
    Use an LLM to generate questions that this document should answer
    """
    prompt = f"""
    Based on the following document, generate {num_questions} questions that this document would answer well. 
    Make the questions natural and varied - don't just copy words from the document.

    Document: {doc_text}

    Questions:
    """

    # Call your LLM here (OpenAI, Anthropic, etc.)
    questions = call_llm(prompt)
    return questions

# Generate ground truth
ground_truth = []
for doc in documents:
    questions = generate_questions_for_document(doc['text'])
    for q in questions:
        ground_truth.append({
            "question": q,
            "expected_doc_id": doc['id'],
            "course": doc['course']
        })

๐Ÿ”ฌ Evaluation Implementation

def evaluate_search_system(search_function, ground_truth_data, top_k=5):
    """
    Evaluate a search system using MRR and Hit Rate
    """
    relevance_scores = []

    for item in ground_truth_data:
        query = item["question"]
        expected_id = item["expected_doc_id"]

        # Get search results
        results = search_function(query, top_k)

        # Check if expected document is in results
        relevance = []
        for i, result in enumerate(results):
            is_relevant = result["_source"]["id"] == expected_id
            relevance.append(is_relevant)

        relevance_scores.append(relevance)

    # Calculate metrics
    mrr = calculate_mrr(relevance_scores)
    hit_rate = calculate_hit_rate(relevance_scores)

    return {
        "MRR": mrr,
        "Hit_Rate": hit_rate,
        "num_queries": len(ground_truth_data)
    }

def calculate_mrr(relevance_scores):
    """Calculate Mean Reciprocal Rank"""
    total_reciprocal_rank = 0

    for relevance in relevance_scores:
        for i, is_relevant in enumerate(relevance):
            if is_relevant:
                total_reciprocal_rank += 1 / (i + 1)  # +1 because rank starts at 1
                break

    return total_reciprocal_rank / len(relevance_scores)

def calculate_hit_rate(relevance_scores):
    """Calculate Hit Rate (any relevant result in top K)"""
    hits = 0

    for relevance in relevance_scores:
        if any(relevance):  # If any result is relevant
            hits += 1

    return hits / len(relevance_scores)

# Example evaluation
results = evaluate_search_system(vector_search, ground_truth)
print(f"📊 MRR: {results['MRR']:.3f}")
print(f"🎯 Hit Rate: {results['Hit_Rate']:.3f}")

๐Ÿ” Comparing Different Approaches

# Test different embedding models
models_to_test = [
    "all-mpnet-base-v2",
    "all-MiniLM-L6-v2", 
    "sentence-transformers/all-roberta-large-v1"
]

results = {}
for model_name in models_to_test:
    print(f"🧪 Testing {model_name}...")

    # Recreate index with new model
    model = SentenceTransformer(model_name)
    # ... reindex documents with new embeddings ...

    # Evaluate
    metrics = evaluate_search_system(vector_search, ground_truth)
    results[model_name] = metrics

# Compare results
for model, metrics in results.items():
    print(f"🏆 {model}: MRR={metrics['MRR']:.3f}, HR={metrics['Hit_Rate']:.3f}")

๐ŸŽฏ Best Practices and Advanced Techniques

1⃣ Choosing the Right Embedding Model

๐Ÿค” Factors to Consider:

  • ๐Ÿฅ Domain: Use domain-specific models when available (bio, legal, etc.)
  • ๐ŸŒ Language: Multilingual models for non-English content
  • โšก Performance: Balance accuracy vs. speed/size requirements
  • ๐Ÿ“ Input Length: Some models handle longer texts better

๐Ÿ† Popular Models by Use Case:

# 🌟 General purpose (good starting point)
"sentence-transformers/all-mpnet-base-v2"  # 768 dim, high quality

# ⚡ Fast and lightweight
"sentence-transformers/all-MiniLM-L6-v2"   # 384 dim, 5x faster

# 🌍 Multilingual
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2"

# 💻 Code search
"microsoft/codebert-base"

# 📚 Long documents
"sentence-transformers/all-mpnet-base-v2"  # handles up to 512 tokens well

2⃣ Optimizing Vector Databases

โš™ Index Configuration:

# FAISS optimization example
import faiss

# For CPU
index = faiss.IndexFlatIP(dimension)  # Inner product (fast exact search)

# For GPU (much faster)
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, index)

# Approximate search for large datasets
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index.train(training_vectors)
index.add(vectors)
index.nprobe = 10  # Search parameter

3⃣ Handling Large Datasets

โœ‚ Chunking Strategy:

def chunk_document(text, max_chunk_size=500, overlap=50):
    """
    Split long documents into overlapping chunks
    """
    words = text.split()
    chunks = []

    for i in range(0, len(words), max_chunk_size - overlap):
        chunk = ' '.join(words[i:i + max_chunk_size])
        chunks.append(chunk)

    return chunks

# Process long documents
processed_docs = []
for doc in documents:
    if len(doc['text'].split()) > 500:  # Long document
        chunks = chunk_document(doc['text'])
        for i, chunk in enumerate(chunks):
            chunk_doc = doc.copy()
            chunk_doc['text'] = chunk
            chunk_doc['chunk_id'] = i
            processed_docs.append(chunk_doc)
    else:
        processed_docs.append(doc)

๐Ÿ“ฆ Batch Processing:

def process_embeddings_in_batches(texts, model, batch_size=32):
    """
    Process embeddings in batches to avoid memory issues
    """
    embeddings = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        batch_embeddings = model.encode(batch)
        embeddings.extend(batch_embeddings)

    return embeddings

4⃣ Query Enhancement

๐Ÿ” Query Expansion:

def expand_query(original_query, expansion_terms=3):
    """
    Add related terms to improve search coverage
    """
    # Use word embeddings to find similar terms
    similar_words = find_similar_words(original_query, n=expansion_terms)
    expanded_query = original_query + " " + " ".join(similar_words)
    return expanded_query

๐ŸŽญ Multi-Vector Search:

def multi_vector_search(query, models, weights=None):
    """
    Combine results from multiple embedding models
    """
    if weights is None:
        weights = [1.0 / len(models)] * len(models)

    all_results = []
    for model, weight in zip(models, weights):
        results = vector_search_with_model(query, model)
        weighted_results = [(r, r['_score'] * weight) for r in results]
        all_results.extend(weighted_results)

    # Combine and re-rank
    combined_results = combine_and_rerank(all_results)
    return combined_results

5⃣ Monitoring and Debugging

๐Ÿ“Š Search Analytics:

from datetime import datetime

def log_search_analytics(query, results, user_id=None):
    """
    Log search queries and results for analysis
    """
    analytics_data = {
        "timestamp": datetime.now(),
        "query": query,
        "user_id": user_id,
        "num_results": len(results),
        "top_score": results[0]['_score'] if results else 0,
        "result_ids": [r['_source']['id'] for r in results[:5]]
    }

    # Save to analytics database
    save_analytics(analytics_data)

def analyze_search_patterns():
    """
    Analyze common queries and failure patterns
    """
    # Common queries without good results
    low_score_queries = get_queries_with_low_scores()

    # Queries with no clicks
    no_click_queries = get_queries_without_clicks()

    return {
        "improvement_opportunities": low_score_queries,
        "potential_gaps": no_click_queries
    }

6⃣ A/B Testing Search Systems

def ab_test_search_systems(system_a, system_b, test_queries, metric="mrr"):
    """
    Compare two search systems
    """
    results_a = evaluate_search_system(system_a, test_queries)
    results_b = evaluate_search_system(system_b, test_queries)

    improvement = (results_b[metric] - results_a[metric]) / results_a[metric] * 100

    return {
        "system_a": results_a,
        "system_b": results_b,
        "improvement_percent": improvement,
        "winner": "B" if results_b[metric] > results_a[metric] else "A"
    }

๐Ÿš€ Conclusion and Next Steps

๐ŸŽ“ What You’ve Learned

In this comprehensive guide, you’ve learned:

  1. ๐Ÿ” Fundamentals: What vector search is and why it’s powerful
  2. ๐Ÿงฎ Vector Representations: From one-hot encoding to dense embeddings
  3. โšก Search Techniques: Similarity metrics, hybrid search, and ANN algorithms
  4. ๐Ÿ—„ Vector Databases: How to choose and use specialized databases
  5. ๐Ÿ›  Implementation: Hands-on setup with Elasticsearch
  6. ๐Ÿ“Š Evaluation: How to measure and improve search performance
  7. ๐ŸŽฏ Best Practices: Optimization techniques and production considerations

๐Ÿ”‘ Key Takeaways

โœ… Vector search enables semantic understanding – finding meaning, not just keywords
โœ… Embeddings capture relationships – similar items have similar vectors

โœ… Hybrid search combines the best of both worlds – semantic + keyword matching
โœ… Evaluation is crucial – always measure performance with proper metrics
โœ… Choose the right tools – different databases and models for different needs

๐Ÿ›ค Next Steps for Your Journey

๐ŸŽฏ Immediate Actions:

  1. ๐Ÿ”ฌ Practice with Real Data: Try the code examples with your own dataset
  2. ๐Ÿงช Experiment with Models: Test different embedding models for your use case
  3. ๐Ÿ— Build a Simple Project: Create a search system for a specific domain
  4. ๐Ÿ‘ฅ Join Communities: Participate in vector search and LLM communities

๐Ÿš€ Advanced Topics to Explore:

  1. ๐ŸŽญ Multi-Modal Search: Combining text, image, and audio search
  2. โšก Real-Time Updates: Handling dynamic document collections
  3. ๐ŸŒ Federated Search: Searching across multiple vector databases
  4. ๐ŸŽฏ Custom Embeddings: Training domain-specific embedding models
  5. ๐Ÿญ Production Deployment: Scaling vector search for millions of users

๐Ÿ“š Recommended Resources:

  • ๐Ÿ“„ Papers: “Attention Is All You Need”, “BERT”, “Sentence-BERT”
  • ๐ŸŽ“ Courses: Deep Learning Specialization, NLP courses
  • ๐Ÿ›  Tools: Hugging Face, LangChain, Vector database documentation
  • ๐Ÿ‘ฅ Communities: Reddit r/MachineLearning, Discord servers, GitHub discussions

๐Ÿ’ญ Final Thoughts

Vector search is transforming how we find and interact with information. As LLMs and AI applications continue to grow, understanding vector search becomes increasingly valuable. The concepts you’ve learned here form the foundation for building intelligent search systems, recommendation engines, and AI applications.

Remember: Start simple, measure everything, and iterate based on real user needs. The best search system is one that actually helps users find what they’re looking for quickly and accurately.

Happy searching! ๐Ÿ”โœจ

๐Ÿ“– Additional Resources

๐Ÿ›  Tools and Libraries

  • ๐Ÿง  Embedding Models: Sentence Transformers, OpenAI, Cohere
  • ๐Ÿ—„ Vector Databases: Pinecone, Weaviate, Milvus, Chroma
  • ๐Ÿ“Š Evaluation: BEIR benchmark, custom evaluation frameworks
  • ๐Ÿš€ Production: Kubernetes deployments, monitoring tools


This content originally appeared on DEV Community and was authored by Abdelrahman Adnan