This content originally appeared on DEV Community and was authored by Abdelrahman Adnan
Hands-On Learning: Building on Chapter 1’s foundations, this chapter dives deep into practical implementations. You’ll learn to build production-ready vector search systems using Elasticsearch, evaluate performance, and apply advanced optimization techniques.
Table of Contents
Hands-On Implementation with Elasticsearch
Evaluating Vector Search Performance
Best Practices and Advanced Techniques
Conclusion and Next Steps
Hands-On Implementation with Elasticsearch
Setting Up Elasticsearch for Vector Search
Step 1: Start Elasticsearch with Docker
docker run -it \
--rm \
--name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.4.3
Step 2: Install Required Libraries
pip install elasticsearch sentence-transformers pandas numpy
Complete Implementation
Step 1: Prepare Your Data
import json
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
# Load your documents
with open('documents.json', 'rt') as f_in:
docs_raw = json.load(f_in)
documents = []
for course_dict in docs_raw:
for doc in course_dict['documents']:
doc['course'] = course_dict['course']
documents.append(doc)
Step 2: Generate Embeddings
# Initialize the embedding model
model = SentenceTransformer("all-mpnet-base-v2")
# Generate embeddings for each document
for doc in documents:
# Create embedding from the text field
doc["text_vector"] = model.encode(doc["text"]).tolist()
Step 3: Create Elasticsearch Index
# Connect to Elasticsearch
es_client = Elasticsearch('http://localhost:9200')
index_name = "course-questions"
# Define index settings and mappings
index_settings = {
"settings": {
"number_of_shards": 1, # Number of primary shards
"number_of_replicas": 0 # Number of replica shards
},
"mappings": {
"properties": {
"text": {"type": "text"}, # Main text content
"section": {"type": "text"}, # Section information
"question": {"type": "text"}, # Questions
"course": {"type": "keyword"}, # Course identifier (exact match)
"text_vector": { # Vector field
"type": "dense_vector",
"dims": 768, # Must match your model's output
"index": True, # Enable indexing
"similarity": "cosine" # Similarity metric
}
}
}
}
# Create the index
es_client.indices.delete(index=index_name, ignore_unavailable=True)
es_client.indices.create(index=index_name, body=index_settings)
Step 4: Index Documents
# Add documents to the index
for doc in documents:
try:
es_client.index(index=index_name, document=doc)
except Exception as e:
print(f"Error indexing document: {e}")
print(f"Indexed {len(documents)} documents")
Step 5: Perform Vector Search
def vector_search(query_text, top_k=5):
"""
Perform vector search on Elasticsearch
"""
# Encode the query
query_vector = model.encode(query_text)
# Define k-NN query
knn_query = {
"field": "text_vector",
"query_vector": query_vector,
"k": top_k,
"num_candidates": 10000 # Number of candidates to consider
}
# Execute search
response = es_client.search(
index=index_name,
knn=knn_query,
source=["text", "section", "question", "course"]
)
return response["hits"]["hits"]
# Example search
query = "How do I install Python packages?"
results = vector_search(query)
for i, result in enumerate(results):
print(f"Result {i+1}:")
print(f"Score: {result['_score']:.4f}")
print(f"Text: {result['_source']['text']}")
print(f"Course: {result['_source']['course']}")
print("-" * 50)
Step 6: Combine with Keyword Search (Hybrid)
def hybrid_search(query_text, top_k=5):
"""
Combine vector and keyword search
"""
query_vector = model.encode(query_text)
search_query = {
"query": {
"bool": {
"should": [
# Keyword search component
{
"multi_match": {
"query": query_text,
"fields": ["text", "section", "question"],
"boost": 1.0
}
}
]
}
},
"knn": {
"field": "text_vector",
"query_vector": query_vector,
"k": top_k,
"num_candidates": 10000,
"boost": 1.0
}
}
response = es_client.search(
index=index_name,
body=search_query,
size=top_k
)
return response["hits"]["hits"]
Evaluating Vector Search Performance
Why Evaluation Matters
When building a search system, you need to measure how well it works. Different embedding models, search parameters, and techniques can dramatically affect results.
Key Metrics
1⃣ Mean Reciprocal Rank (MRR)
What it measures: How high the first relevant result appears on average
Formula: MRR = (1/|Q|) ร ฮฃ(1/rank_i)
Range: 0 to 1 (higher is better)
Example:
- Query 1: Relevant result at position 1 โ 1/1 = 1.0
- Query 2: Relevant result at position 3 โ 1/3 = 0.33
- Query 3: Relevant result at position 2 โ 1/2 = 0.5
- MRR = (1.0 + 0.33 + 0.5) / 3 = 0.61
2⃣ Hit Rate @ K (Recall @ K)
What it measures: Percentage of queries that have at least one relevant result in top K
Formula: HR@k = (Number of queries with relevant results in top k) / Total queries
Range: 0 to 1 (higher is better)
Example:
- 100 queries total
- 85 queries have relevant results in top 5
- Hit Rate @ 5 = 85/100 = 0.85
Creating Ground Truth Data
To evaluate your system, you need ground truth – known correct answers for test queries.
Method 1: Manual Creation
ground_truth = [
{
"question": "How do I install Python?",
"expected_doc_id": "python_installation_guide",
"course": "data-engineering"
},
{
"question": "What is a data pipeline?",
"expected_doc_id": "pipeline_basics",
"course": "data-engineering"
}
# ... more examples
]
Method 2: LLM-Generated Questions
def generate_questions_for_document(doc_text, num_questions=5):
"""
Use an LLM to generate questions that this document should answer
"""
prompt = f"""
Based on the following document, generate {num_questions} questions that this document would answer well.
Make the questions natural and varied - don't just copy words from the document.
Document: {doc_text}
Questions:
"""
# Call your LLM here (OpenAI, Anthropic, etc.)
questions = call_llm(prompt)
return questions
# Generate ground truth
ground_truth = []
for doc in documents:
questions = generate_questions_for_document(doc['text'])
for q in questions:
ground_truth.append({
"question": q,
"expected_doc_id": doc['id'],
"course": doc['course']
})
Evaluation Implementation
def evaluate_search_system(search_function, ground_truth_data, top_k=5):
"""
Evaluate a search system using MRR and Hit Rate
"""
relevance_scores = []
for item in ground_truth_data:
query = item["question"]
expected_id = item["expected_doc_id"]
# Get search results
results = search_function(query, top_k)
# Check if expected document is in results
relevance = []
for i, result in enumerate(results):
is_relevant = result["_source"]["id"] == expected_id
relevance.append(is_relevant)
relevance_scores.append(relevance)
# Calculate metrics
mrr = calculate_mrr(relevance_scores)
hit_rate = calculate_hit_rate(relevance_scores)
return {
"MRR": mrr,
"Hit_Rate": hit_rate,
"num_queries": len(ground_truth_data)
}
def calculate_mrr(relevance_scores):
"""Calculate Mean Reciprocal Rank"""
total_reciprocal_rank = 0
for relevance in relevance_scores:
for i, is_relevant in enumerate(relevance):
if is_relevant:
total_reciprocal_rank += 1 / (i + 1) # +1 because rank starts at 1
break
return total_reciprocal_rank / len(relevance_scores)
def calculate_hit_rate(relevance_scores):
"""Calculate Hit Rate (any relevant result in top K)"""
hits = 0
for relevance in relevance_scores:
if any(relevance): # If any result is relevant
hits += 1
return hits / len(relevance_scores)
# Example evaluation
results = evaluate_search_system(vector_search, ground_truth)
print(f"📊 MRR: {results['MRR']:.3f}")
print(f"🎯 Hit Rate: {results['Hit_Rate']:.3f}")
Comparing Different Approaches
# Test different embedding models
models_to_test = [
"all-mpnet-base-v2",
"all-MiniLM-L6-v2",
"sentence-transformers/all-roberta-large-v1"
]
results = {}
for model_name in models_to_test:
print(f"🧪 Testing {model_name}...")
# Recreate index with new model
model = SentenceTransformer(model_name)
# ... reindex documents with new embeddings ...
# Evaluate
metrics = evaluate_search_system(vector_search, ground_truth)
results[model_name] = metrics
# Compare results
for model, metrics in results.items():
print(f"🏆 {model}: MRR={metrics['MRR']:.3f}, HR={metrics['Hit_Rate']:.3f}")
Best Practices and Advanced Techniques
1⃣ Choosing the Right Embedding Model
Factors to Consider:
-
Domain: Use domain-specific models when available (bio, legal, etc.)
-
Language: Multilingual models for non-English content
-
Performance: Balance accuracy vs. speed/size requirements
-
Input Length: Some models handle longer texts better
Popular Models by Use Case:
# 🌟 General purpose (good starting point)
"sentence-transformers/all-mpnet-base-v2" # 768 dim, high quality
# ⚡ Fast and lightweight
"sentence-transformers/all-MiniLM-L6-v2" # 384 dim, 5x faster
# 🌍 Multilingual
"sentence-transformers/paraphrase-multilingual-mpnet-base-v2"
# 💻 Code search
"microsoft/codebert-base"
# 📚 Long documents
"sentence-transformers/all-mpnet-base-v2" # handles up to 512 tokens well
2⃣ Optimizing Vector Databases
Index Configuration:
# FAISS optimization example
import faiss
# For CPU
index = faiss.IndexFlatIP(dimension) # Inner product (fast exact search)
# For GPU (much faster)
res = faiss.StandardGpuResources()
index = faiss.index_cpu_to_gpu(res, 0, index)
# Approximate search for large datasets
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index.train(training_vectors)
index.add(vectors)
index.nprobe = 10 # Search parameter
3⃣ Handling Large Datasets
Chunking Strategy:
def chunk_document(text, max_chunk_size=500, overlap=50):
"""
Split long documents into overlapping chunks
"""
words = text.split()
chunks = []
for i in range(0, len(words), max_chunk_size - overlap):
chunk = ' '.join(words[i:i + max_chunk_size])
chunks.append(chunk)
return chunks
# Process long documents
processed_docs = []
for doc in documents:
if len(doc['text'].split()) > 500: # Long document
chunks = chunk_document(doc['text'])
for i, chunk in enumerate(chunks):
chunk_doc = doc.copy()
chunk_doc['text'] = chunk
chunk_doc['chunk_id'] = i
processed_docs.append(chunk_doc)
else:
processed_docs.append(doc)
Batch Processing:
def process_embeddings_in_batches(texts, model, batch_size=32):
"""
Process embeddings in batches to avoid memory issues
"""
embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
batch_embeddings = model.encode(batch)
embeddings.extend(batch_embeddings)
return embeddings
4⃣ Query Enhancement
Query Expansion:
def expand_query(original_query, expansion_terms=3):
"""
Add related terms to improve search coverage
"""
# Use word embeddings to find similar terms
similar_words = find_similar_words(original_query, n=expansion_terms)
expanded_query = original_query + " " + " ".join(similar_words)
return expanded_query
Multi-Vector Search:
def multi_vector_search(query, models, weights=None):
"""
Combine results from multiple embedding models
"""
if weights is None:
weights = [1.0 / len(models)] * len(models)
all_results = []
for model, weight in zip(models, weights):
results = vector_search_with_model(query, model)
weighted_results = [(r, r['_score'] * weight) for r in results]
all_results.extend(weighted_results)
# Combine and re-rank
combined_results = combine_and_rerank(all_results)
return combined_results
5⃣ Monitoring and Debugging
Search Analytics:
from datetime import datetime
def log_search_analytics(query, results, user_id=None):
"""
Log search queries and results for analysis
"""
analytics_data = {
"timestamp": datetime.now(),
"query": query,
"user_id": user_id,
"num_results": len(results),
"top_score": results[0]['_score'] if results else 0,
"result_ids": [r['_source']['id'] for r in results[:5]]
}
# Save to analytics database
save_analytics(analytics_data)
def analyze_search_patterns():
"""
Analyze common queries and failure patterns
"""
# Common queries without good results
low_score_queries = get_queries_with_low_scores()
# Queries with no clicks
no_click_queries = get_queries_without_clicks()
return {
"improvement_opportunities": low_score_queries,
"potential_gaps": no_click_queries
}
6⃣ A/B Testing Search Systems
def ab_test_search_systems(system_a, system_b, test_queries, metric="mrr"):
"""
Compare two search systems
"""
results_a = evaluate_search_system(system_a, test_queries)
results_b = evaluate_search_system(system_b, test_queries)
improvement = (results_b[metric] - results_a[metric]) / results_a[metric] * 100
return {
"system_a": results_a,
"system_b": results_b,
"improvement_percent": improvement,
"winner": "B" if results_b[metric] > results_a[metric] else "A"
}
Conclusion and Next Steps
What You’ve Learned
In this comprehensive guide, you’ve learned:
-
Fundamentals: What vector search is and why it’s powerful
-
Vector Representations: From one-hot encoding to dense embeddings
-
Search Techniques: Similarity metrics, hybrid search, and ANN algorithms
-
Vector Databases: How to choose and use specialized databases
-
Implementation: Hands-on setup with Elasticsearch
-
Evaluation: How to measure and improve search performance
-
Best Practices: Optimization techniques and production considerations
Key Takeaways
Vector search enables semantic understanding – finding meaning, not just keywords
Embeddings capture relationships – similar items have similar vectors
Hybrid search combines the best of both worlds – semantic + keyword matching
Evaluation is crucial – always measure performance with proper metrics
Choose the right tools – different databases and models for different needs
Next Steps for Your Journey
Immediate Actions:
-
Practice with Real Data: Try the code examples with your own dataset
-
Experiment with Models: Test different embedding models for your use case
-
Build a Simple Project: Create a search system for a specific domain
-
Join Communities: Participate in vector search and LLM communities
Advanced Topics to Explore:
-
Multi-Modal Search: Combining text, image, and audio search
-
Real-Time Updates: Handling dynamic document collections
-
Federated Search: Searching across multiple vector databases
-
Custom Embeddings: Training domain-specific embedding models
-
Production Deployment: Scaling vector search for millions of users
Recommended Resources:
-
Papers: “Attention Is All You Need”, “BERT”, “Sentence-BERT”
-
Courses: Deep Learning Specialization, NLP courses
-
Tools: Hugging Face, LangChain, Vector database documentation
-
Communities: Reddit r/MachineLearning, Discord servers, GitHub discussions
Final Thoughts
Vector search is transforming how we find and interact with information. As LLMs and AI applications continue to grow, understanding vector search becomes increasingly valuable. The concepts you’ve learned here form the foundation for building intelligent search systems, recommendation engines, and AI applications.
Remember: Start simple, measure everything, and iterate based on real user needs. The best search system is one that actually helps users find what they’re looking for quickly and accurately.
Happy searching!
Additional Resources
Tools and Libraries
-
Embedding Models: Sentence Transformers, OpenAI, Cohere
-
Vector Databases: Pinecone, Weaviate, Milvus, Chroma
-
Evaluation: BEIR benchmark, custom evaluation frameworks
-
Production: Kubernetes deployments, monitoring tools
This content originally appeared on DEV Community and was authored by Abdelrahman Adnan