This content originally appeared on DEV Community and was authored by Sushil Baligar
Learning Objectives
By the end of this module, you will:
- Master Bazel’s caching mechanisms (local and remote)
- Implement efficient dependency management for Python projects
- Set up remote caching for team collaboration
- Optimize build performance through smart caching strategies
- Handle complex dependency scenarios and version conflicts
4.1 Understanding Bazel Caching
Local Caching Fundamentals
Bazel automatically caches build outputs locally. Understanding how this works is crucial for optimization:
# Check cache status
bazel info output_base
bazel info repository_cache
# Clean specific caches
bazel clean --expunge # Nuclear option - removes everything
bazel clean # Removes output files but keeps cache
Cache Key Components
Bazel creates cache keys based on:
- Input file contents (not timestamps)
- Build command and flags
- Toolchain configuration
- Environment variables that affect the build
# //BUILD - This will be cached efficiently
py_library(
name = "utils",
srcs = ["utils.py"],
deps = ["@pypi//requests"],
)
# Changes to utils.py content will invalidate cache
# Changes to modification time won't affect cache
4.2 Remote Caching Setup
Basic Remote Cache Configuration
Set up a simple HTTP remote cache:
# .bazelrc
build --remote_cache=https://storage.googleapis.com/my-bazel-cache
build --remote_upload_local_results=true
build --remote_timeout=60
build --remote_retries=3
# For authentication
build --google_default_credentials=true
# OR
build --remote_header=Authorization=Bearer=your-token
Advanced Remote Cache Setup
Configure a more sophisticated remote cache with build event publishing:
# .bazelrc.remote
# Remote cache configuration
build:remote --remote_cache=grpc://cache.example.com:443
build:remote --remote_timeout=600
build:remote --remote_retries=5
build:remote --remote_upload_local_results=true
# Build event publishing
build:remote --bes_backend=grpc://analytics.example.com:443
build:remote --bes_results_url=https://analytics.example.com/build/
# Remote execution (if available)
build:remote --remote_executor=grpc://executor.example.com:443
build:remote --jobs=50
# Platform configuration
build:remote --host_platform=@bazel_tools//platforms:linux_x86_64
build:remote --platforms=@bazel_tools//platforms:linux_x86_64
# Use remote config
build --config=remote
Docker-based Remote Cache
Set up a local remote cache using Docker for team development:
# cache-server.dockerfile
FROM nginx:alpine
COPY nginx.conf /etc/nginx/nginx.conf
COPY cache.conf /etc/nginx/conf.d/default.conf
EXPOSE 8080
# cache.conf
server {
listen 8080;
client_max_body_size 2G;
location / {
root /var/cache/bazel;
dav_methods PUT DELETE;
create_full_put_path on;
dav_access user:rw group:rw all:r;
}
}
# docker-compose.yml for team cache
version: '3.8'
services:
bazel-cache:
build:
context: .
dockerfile: cache-server.dockerfile
ports:
- "8080:8080"
volumes:
- bazel-cache-data:/var/cache/bazel
restart: unless-stopped
volumes:
bazel-cache-data:
4.3 Python Dependency Management
Advanced pip_parse Configuration
Set up sophisticated Python dependency management:
# WORKSPACE
load("@rules_python//python:repositories.bzl", "python_register_toolchains")
load("@rules_python//python:pip.bzl", "pip_parse")
python_register_toolchains(
name = "python3_11",
python_version = "3.11.4",
)
# Main dependencies
pip_parse(
name = "pypi",
requirements_lock = "//third_party:requirements.lock",
python_interpreter_target = "@python3_11_host//:python",
)
load("@pypi//:requirements.bzl", "install_deps")
install_deps()
# Development dependencies (separate namespace)
pip_parse(
name = "pypi_dev",
requirements_lock = "//third_party:requirements-dev.lock",
python_interpreter_target = "@python3_11_host//:python",
)
load("@pypi_dev//:requirements.bzl", dev_install_deps = "install_deps")
dev_install_deps()
# Testing dependencies
pip_parse(
name = "pypi_test",
requirements_lock = "//third_party:requirements-test.lock",
python_interpreter_target = "@python3_11_host//:python",
)
load("@pypi_test//:requirements.bzl", test_install_deps = "install_deps")
test_install_deps()
Dependency Lock Files
Create comprehensive lock files for reproducible builds:
# //third_party/requirements.txt
# Production dependencies
fastapi>=0.104.0,<0.105.0
uvicorn[standard]>=0.24.0,<0.25.0
pydantic>=2.5.0,<3.0.0
sqlalchemy>=2.0.0,<2.1.0
alembic>=1.13.0,<1.14.0
redis>=5.0.0,<6.0.0
celery>=5.3.0,<5.4.0
# //third_party/requirements-dev.lock
# Auto-generated - DO NOT EDIT
# This file was generated by pip-compile with python 3.11
# To update, run: pip-compile requirements-dev.txt
fastapi==0.104.1
# via -r requirements-dev.txt
starlette==0.27.0
# via fastapi
pydantic==2.5.2
# via fastapi
pydantic-core==2.14.5
# via pydantic
typing-extensions==4.8.0
# via pydantic
uvicorn==0.24.0
# via -r requirements-dev.txt
# ... complete locked versions
Multi-Platform Dependencies
Handle platform-specific dependencies:
# //third_party/BUILD
load("@rules_python//python:defs.bzl", "py_library")
# Platform-specific dependencies
py_library(
name = "platform_deps",
deps = select({
"@platforms//os:linux": [
"@pypi//psutil",
"@pypi//linux_specific_lib",
],
"@platforms//os:macos": [
"@pypi//psutil",
"@pypi//pyobjc_framework_cocoa",
],
"@platforms//os:windows": [
"@pypi//psutil",
"@pypi//pywin32",
],
"//conditions:default": ["@pypi//psutil"],
}),
)
4.4 Advanced Caching Strategies
Repository Caching
Configure repository-level caching for external dependencies:
# .bazelrc
build --repository_cache=/home/user/.cache/bazel/repos
build --experimental_repository_cache_hardlinks=true
# Force repository re-fetch when needed
build --repository_cache_hits_threshold=10
Action Caching vs Output Caching
Understand the difference and optimize accordingly:
# //tools/cache_optimization.bzl
def cache_friendly_genrule(name, srcs, cmd, **kwargs):
"""Genrule optimized for caching."""
native.genrule(
name = name,
srcs = srcs,
cmd = cmd,
# Ensure deterministic output
stamp = 0,
# Add cache-friendly attributes
**kwargs
)
Cache Warming Strategies
Implement cache warming for CI/CD:
#!/bin/bash
# scripts/warm_cache.sh
# Warm cache with common targets
bazel build //... --keep_going
bazel test //... --test_tag_filters=-slow --keep_going
# Pre-build common development targets
bazel build //src/main:app //src/tests:unit_tests
# Cache commonly used external dependencies
bazel build @pypi//requests @pypi//fastapi @pypi//pytest
echo "Cache warming complete"
4.5 Dependency Resolution and Conflicts
Version Conflict Resolution
Handle complex version conflicts systematically:
# //third_party/overrides.bzl
def apply_dependency_overrides():
"""Apply necessary dependency version overrides."""
# Override conflicting versions
override_targets = {
# Force specific numpy version across all deps
"@pypi//numpy": "@pypi_pinned//numpy",
# Use our patched version of requests
"@pypi//requests": "//third_party/patched:requests",
}
return override_targets
# //third_party/patched/BUILD
py_library(
name = "requests",
srcs = ["requests_patched.py"],
deps = [
"@pypi//urllib3",
"@pypi//certifi",
"@pypi//charset_normalizer",
],
visibility = ["//visibility:public"],
)
Custom Dependency Resolution
Implement custom resolution for complex scenarios:
# //tools/custom_deps.bzl
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
def custom_python_deps():
"""Install Python dependencies with custom resolution."""
# Custom ML library not available on PyPI
http_archive(
name = "custom_ml_lib",
urls = ["https://github.com/company/ml-lib/archive/v2.1.0.tar.gz"],
sha256 = "abcd1234...",
strip_prefix = "ml-lib-2.1.0",
build_file = "//third_party:custom_ml_lib.BUILD",
)
# Forked dependency with patches
http_archive(
name = "patched_fastapi",
urls = ["https://github.com/our-org/fastapi/archive/patched-v0.104.1.tar.gz"],
sha256 = "efgh5678...",
strip_prefix = "fastapi-patched-v0.104.1",
build_file = "//third_party:patched_fastapi.BUILD",
)
4.6 Performance Monitoring and Optimization
Build Performance Analysis
Monitor and analyze build performance:
# Generate build profile
bazel build //... --profile=build_profile.json
bazel analyze-profile build_profile.json
# Memory usage analysis
bazel build //... --memory_profile=memory_profile.json
# Detailed timing
bazel build //... --experimental_profile_additional_tasks
Cache Hit Rate Monitoring
Track cache effectiveness:
#!/bin/bash
# scripts/monitor_cache.sh
echo "Cache Statistics:"
echo "=================="
# Local cache info
echo "Local cache location: $(bazel info output_base)"
echo "Repository cache: $(bazel info repository_cache)"
# Build with cache stats
bazel build //... --profile=cache_profile.json 2>&1 | tee build.log
# Extract cache hit statistics
grep -E "(cache hit|remote cache)" build.log | wc -l
echo "Remote cache hits: $(grep -c 'remote cache hit' build.log)"
echo "Local cache hits: $(grep -c 'local cache hit' build.log)"
Optimizing for Cache Efficiency
Best practices for cache-friendly builds:
# //BUILD - Cache-optimized targets
py_library(
name = "stable_utils",
srcs = ["utils.py"],
# Stable dependencies cache better
deps = [
"@pypi//requests", # Pinned version
"//common:constants", # Rarely changing
],
)
py_library(
name = "feature_code",
srcs = ["feature.py"],
# Separate frequently changing code
deps = [
":stable_utils", # Reuse cached stable parts
"//config:dynamic_config", # Accept this changes often
],
)
4.7 Practical Examples
Complete Web Application Setup
Real-world example with optimized caching and dependencies:
# //web_app/BUILD
load("@rules_python//python:defs.bzl", "py_binary", "py_library", "py_test")
# Core application library (stable, caches well)
py_library(
name = "app_core",
srcs = [
"core/__init__.py",
"core/models.py",
"core/database.py",
"core/auth.py",
],
deps = [
"@pypi//fastapi",
"@pypi//sqlalchemy",
"@pypi//pydantic",
"@pypi//passlib",
"//common:config",
],
)
# API routes (changes more frequently)
py_library(
name = "api_routes",
srcs = [
"api/__init__.py",
"api/users.py",
"api/posts.py",
"api/auth.py",
],
deps = [
":app_core",
"@pypi//fastapi",
"//common:validators",
],
)
# Main application
py_binary(
name = "web_app",
srcs = ["main.py"],
deps = [
":app_core",
":api_routes",
"@pypi//uvicorn",
],
main = "main.py",
)
# Comprehensive test suite
py_test(
name = "integration_test",
srcs = ["test_integration.py"],
deps = [
":web_app",
"@pypi_test//pytest",
"@pypi_test//httpx",
"@pypi_test//pytest_asyncio",
],
data = ["test_data.json"],
)
CI/CD Cache Configuration
Optimize CI/CD with proper cache configuration:
# .github/workflows/build.yml
name: Build and Test
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Mount bazel cache
uses: actions/cache@v3
with:
path: |
~/.cache/bazel
~/.cache/bazel-repo
key: bazel-${{ runner.os }}-${{ hashFiles('WORKSPACE', '**/*.bzl', 'requirements*.txt') }}
restore-keys: |
bazel-${{ runner.os }}-
- name: Configure Bazel
run: |
echo "build --repository_cache=/home/runner/.cache/bazel-repo" >> .bazelrc.ci
echo "build --disk_cache=/home/runner/.cache/bazel" >> .bazelrc.ci
echo "build --remote_cache=${{ secrets.REMOTE_CACHE_URL }}" >> .bazelrc.ci
echo "build --remote_upload_local_results=true" >> .bazelrc.ci
- name: Build
run: bazel build //... --config=ci
- name: Test
run: bazel test //... --config=ci --test_output=errors
4.8 Troubleshooting Common Issues
Cache Invalidation Problems
Debug cache invalidation issues:
# Check why target was rebuilt
bazel build //target:name --explain=explain.log
bazel build //target:name --verbose_explanations
# Compare action keys
bazel aquery //target:name --output=textproto > action1.txt
# Make change and run again
bazel aquery //target:name --output=textproto > action2.txt
diff action1.txt action2.txt
Dependency Resolution Issues
Debug complex dependency problems:
# Analyze dependency graph
bazel query "deps(//your:target)" --output=graph | dot -Tpng > deps.png
# Find conflicting versions
bazel query "//... intersect deps(@pypi//problematic_package//:*)"
# Check why a dependency was selected
bazel query --output=build //external:pypi_problematic_package
Remote Cache Issues
Troubleshoot remote cache problems:
# Test remote cache connectivity
bazel build //simple:target --remote_cache=your-cache-url --execution_log_json_file=exec.json
# Check cache upload/download
bazel build //target --remote_cache=your-cache-url --experimental_remote_cache_eviction_retries=3 -v
# Verify authentication
curl -H "Authorization: Bearer $TOKEN" https://your-cache-url/status
4.9 Best Practices Summary
Caching Best Practices
- Use remote caching for team collaboration
- Separate stable and volatile dependencies
- Monitor cache hit rates regularly
- Implement cache warming in CI/CD
- Use repository caching for external dependencies
Dependency Management Best Practices
- Pin all dependency versions in lock files
- Separate production, development, and test dependencies
- Handle platform-specific dependencies explicitly
- Use custom resolution for complex scenarios
- Monitor for security vulnerabilities in dependencies
Performance Optimization
- Profile builds regularly to identify bottlenecks
- Structure targets to maximize cache reuse
- Use hermetic builds for reproducibility
- Implement incremental build strategies
- Monitor and optimize resource usage
Module 4 Exercises
Exercise 1: Remote Cache Setup
Set up a remote cache using Google Cloud Storage or AWS S3 and measure the cache hit rate improvement.
Exercise 2: Dependency Conflict Resolution
Create a scenario with conflicting dependency versions and resolve it using custom overrides.
Exercise 3: Cache Performance Analysis
Profile a complex build, identify cache inefficiencies, and implement optimizations.
Exercise 4: CI/CD Cache Integration
Set up cache optimization in a CI/CD pipeline and measure build time improvements.
Next Steps
In Module 5, we’ll cover “Advanced Python Rules and Toolchains” where you’ll learn to:
- Create custom Python rules and macros
- Configure multiple Python toolchains
- Implement hermetic Python builds
- Use aspects for code analysis
Key Takeaways
- Bazel’s caching is content-based, not timestamp-based
- Remote caching enables massive build speedups for teams
- Proper dependency management prevents version conflicts
- Cache performance should be monitored and optimized
- Structured targets maximize cache reuse efficiency
- Lock files ensure reproducible builds across environments
https://www.linkedin.com/in/sushilbaligar/
https://github.com/sushilbaligar
https://dev.to/sushilbaligar
https://medium.com/@sushilbaligar
This content originally appeared on DEV Community and was authored by Sushil Baligar