Modern Bazel with Python-Module 4: Caching and Dependencies



This content originally appeared on DEV Community and was authored by Sushil Baligar

Learning Objectives

By the end of this module, you will:

  • Master Bazel’s caching mechanisms (local and remote)
  • Implement efficient dependency management for Python projects
  • Set up remote caching for team collaboration
  • Optimize build performance through smart caching strategies
  • Handle complex dependency scenarios and version conflicts

4.1 Understanding Bazel Caching

Local Caching Fundamentals

Bazel automatically caches build outputs locally. Understanding how this works is crucial for optimization:

# Check cache status
bazel info output_base
bazel info repository_cache

# Clean specific caches
bazel clean --expunge  # Nuclear option - removes everything
bazel clean            # Removes output files but keeps cache

Cache Key Components

Bazel creates cache keys based on:

  • Input file contents (not timestamps)
  • Build command and flags
  • Toolchain configuration
  • Environment variables that affect the build
# //BUILD - This will be cached efficiently
py_library(
    name = "utils",
    srcs = ["utils.py"],
    deps = ["@pypi//requests"],
)

# Changes to utils.py content will invalidate cache
# Changes to modification time won't affect cache

4.2 Remote Caching Setup

Basic Remote Cache Configuration

Set up a simple HTTP remote cache:

# .bazelrc
build --remote_cache=https://storage.googleapis.com/my-bazel-cache
build --remote_upload_local_results=true
build --remote_timeout=60
build --remote_retries=3

# For authentication
build --google_default_credentials=true
# OR
build --remote_header=Authorization=Bearer=your-token

Advanced Remote Cache Setup

Configure a more sophisticated remote cache with build event publishing:

# .bazelrc.remote
# Remote cache configuration
build:remote --remote_cache=grpc://cache.example.com:443
build:remote --remote_timeout=600
build:remote --remote_retries=5
build:remote --remote_upload_local_results=true

# Build event publishing
build:remote --bes_backend=grpc://analytics.example.com:443
build:remote --bes_results_url=https://analytics.example.com/build/

# Remote execution (if available)
build:remote --remote_executor=grpc://executor.example.com:443
build:remote --jobs=50

# Platform configuration
build:remote --host_platform=@bazel_tools//platforms:linux_x86_64
build:remote --platforms=@bazel_tools//platforms:linux_x86_64

# Use remote config
build --config=remote

Docker-based Remote Cache

Set up a local remote cache using Docker for team development:

# cache-server.dockerfile
FROM nginx:alpine

COPY nginx.conf /etc/nginx/nginx.conf
COPY cache.conf /etc/nginx/conf.d/default.conf

EXPOSE 8080
# cache.conf
server {
    listen 8080;
    client_max_body_size 2G;

    location / {
        root /var/cache/bazel;
        dav_methods PUT DELETE;
        create_full_put_path on;
        dav_access user:rw group:rw all:r;
    }
}
# docker-compose.yml for team cache
version: '3.8'
services:
  bazel-cache:
    build:
      context: .
      dockerfile: cache-server.dockerfile
    ports:
      - "8080:8080"
    volumes:
      - bazel-cache-data:/var/cache/bazel
    restart: unless-stopped

volumes:
  bazel-cache-data:

4.3 Python Dependency Management

Advanced pip_parse Configuration

Set up sophisticated Python dependency management:

# WORKSPACE
load("@rules_python//python:repositories.bzl", "python_register_toolchains")
load("@rules_python//python:pip.bzl", "pip_parse")

python_register_toolchains(
    name = "python3_11",
    python_version = "3.11.4",
)

# Main dependencies
pip_parse(
    name = "pypi",
    requirements_lock = "//third_party:requirements.lock",
    python_interpreter_target = "@python3_11_host//:python",
)

load("@pypi//:requirements.bzl", "install_deps")
install_deps()

# Development dependencies (separate namespace)
pip_parse(
    name = "pypi_dev", 
    requirements_lock = "//third_party:requirements-dev.lock",
    python_interpreter_target = "@python3_11_host//:python",
)

load("@pypi_dev//:requirements.bzl", dev_install_deps = "install_deps")
dev_install_deps()

# Testing dependencies
pip_parse(
    name = "pypi_test",
    requirements_lock = "//third_party:requirements-test.lock", 
    python_interpreter_target = "@python3_11_host//:python",
)

load("@pypi_test//:requirements.bzl", test_install_deps = "install_deps")
test_install_deps()

Dependency Lock Files

Create comprehensive lock files for reproducible builds:

# //third_party/requirements.txt
# Production dependencies
fastapi>=0.104.0,<0.105.0
uvicorn[standard]>=0.24.0,<0.25.0
pydantic>=2.5.0,<3.0.0
sqlalchemy>=2.0.0,<2.1.0
alembic>=1.13.0,<1.14.0
redis>=5.0.0,<6.0.0
celery>=5.3.0,<5.4.0
# //third_party/requirements-dev.lock
# Auto-generated - DO NOT EDIT
# This file was generated by pip-compile with python 3.11
# To update, run: pip-compile requirements-dev.txt

fastapi==0.104.1
    # via -r requirements-dev.txt
starlette==0.27.0
    # via fastapi
pydantic==2.5.2
    # via fastapi
pydantic-core==2.14.5
    # via pydantic
typing-extensions==4.8.0
    # via pydantic
uvicorn==0.24.0
    # via -r requirements-dev.txt
# ... complete locked versions

Multi-Platform Dependencies

Handle platform-specific dependencies:

# //third_party/BUILD
load("@rules_python//python:defs.bzl", "py_library")

# Platform-specific dependencies
py_library(
    name = "platform_deps",
    deps = select({
        "@platforms//os:linux": [
            "@pypi//psutil",
            "@pypi//linux_specific_lib",
        ],
        "@platforms//os:macos": [
            "@pypi//psutil", 
            "@pypi//pyobjc_framework_cocoa",
        ],
        "@platforms//os:windows": [
            "@pypi//psutil",
            "@pypi//pywin32",
        ],
        "//conditions:default": ["@pypi//psutil"],
    }),
)

4.4 Advanced Caching Strategies

Repository Caching

Configure repository-level caching for external dependencies:

# .bazelrc
build --repository_cache=/home/user/.cache/bazel/repos
build --experimental_repository_cache_hardlinks=true

# Force repository re-fetch when needed
build --repository_cache_hits_threshold=10

Action Caching vs Output Caching

Understand the difference and optimize accordingly:

# //tools/cache_optimization.bzl
def cache_friendly_genrule(name, srcs, cmd, **kwargs):
    """Genrule optimized for caching."""
    native.genrule(
        name = name,
        srcs = srcs,
        cmd = cmd,
        # Ensure deterministic output
        stamp = 0,
        # Add cache-friendly attributes
        **kwargs
    )

Cache Warming Strategies

Implement cache warming for CI/CD:

#!/bin/bash
# scripts/warm_cache.sh

# Warm cache with common targets
bazel build //... --keep_going
bazel test //... --test_tag_filters=-slow --keep_going

# Pre-build common development targets
bazel build //src/main:app //src/tests:unit_tests

# Cache commonly used external dependencies
bazel build @pypi//requests @pypi//fastapi @pypi//pytest

echo "Cache warming complete"

4.5 Dependency Resolution and Conflicts

Version Conflict Resolution

Handle complex version conflicts systematically:

# //third_party/overrides.bzl
def apply_dependency_overrides():
    """Apply necessary dependency version overrides."""

    # Override conflicting versions
    override_targets = {
        # Force specific numpy version across all deps
        "@pypi//numpy": "@pypi_pinned//numpy",
        # Use our patched version of requests
        "@pypi//requests": "//third_party/patched:requests",
    }

    return override_targets
# //third_party/patched/BUILD
py_library(
    name = "requests",
    srcs = ["requests_patched.py"],
    deps = [
        "@pypi//urllib3", 
        "@pypi//certifi",
        "@pypi//charset_normalizer",
    ],
    visibility = ["//visibility:public"],
)

Custom Dependency Resolution

Implement custom resolution for complex scenarios:

# //tools/custom_deps.bzl
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

def custom_python_deps():
    """Install Python dependencies with custom resolution."""

    # Custom ML library not available on PyPI
    http_archive(
        name = "custom_ml_lib",
        urls = ["https://github.com/company/ml-lib/archive/v2.1.0.tar.gz"],
        sha256 = "abcd1234...",
        strip_prefix = "ml-lib-2.1.0",
        build_file = "//third_party:custom_ml_lib.BUILD",
    )

    # Forked dependency with patches
    http_archive(
        name = "patched_fastapi",
        urls = ["https://github.com/our-org/fastapi/archive/patched-v0.104.1.tar.gz"],
        sha256 = "efgh5678...",
        strip_prefix = "fastapi-patched-v0.104.1", 
        build_file = "//third_party:patched_fastapi.BUILD",
    )

4.6 Performance Monitoring and Optimization

Build Performance Analysis

Monitor and analyze build performance:

# Generate build profile
bazel build //... --profile=build_profile.json
bazel analyze-profile build_profile.json

# Memory usage analysis  
bazel build //... --memory_profile=memory_profile.json

# Detailed timing
bazel build //... --experimental_profile_additional_tasks

Cache Hit Rate Monitoring

Track cache effectiveness:

#!/bin/bash
# scripts/monitor_cache.sh

echo "Cache Statistics:"
echo "=================="

# Local cache info
echo "Local cache location: $(bazel info output_base)"
echo "Repository cache: $(bazel info repository_cache)"

# Build with cache stats
bazel build //... --profile=cache_profile.json 2>&1 | tee build.log

# Extract cache hit statistics
grep -E "(cache hit|remote cache)" build.log | wc -l
echo "Remote cache hits: $(grep -c 'remote cache hit' build.log)"
echo "Local cache hits: $(grep -c 'local cache hit' build.log)"

Optimizing for Cache Efficiency

Best practices for cache-friendly builds:

# //BUILD - Cache-optimized targets
py_library(
    name = "stable_utils",
    srcs = ["utils.py"],
    # Stable dependencies cache better
    deps = [
        "@pypi//requests",  # Pinned version
        "//common:constants",  # Rarely changing
    ],
)

py_library(
    name = "feature_code", 
    srcs = ["feature.py"],
    # Separate frequently changing code
    deps = [
        ":stable_utils",  # Reuse cached stable parts
        "//config:dynamic_config",  # Accept this changes often
    ],
)

4.7 Practical Examples

Complete Web Application Setup

Real-world example with optimized caching and dependencies:

# //web_app/BUILD
load("@rules_python//python:defs.bzl", "py_binary", "py_library", "py_test")

# Core application library (stable, caches well)
py_library(
    name = "app_core",
    srcs = [
        "core/__init__.py",
        "core/models.py", 
        "core/database.py",
        "core/auth.py",
    ],
    deps = [
        "@pypi//fastapi",
        "@pypi//sqlalchemy", 
        "@pypi//pydantic",
        "@pypi//passlib",
        "//common:config",
    ],
)

# API routes (changes more frequently)
py_library(
    name = "api_routes",
    srcs = [
        "api/__init__.py",
        "api/users.py",
        "api/posts.py", 
        "api/auth.py",
    ],
    deps = [
        ":app_core",
        "@pypi//fastapi",
        "//common:validators",
    ],
)

# Main application
py_binary(
    name = "web_app",
    srcs = ["main.py"],
    deps = [
        ":app_core",
        ":api_routes", 
        "@pypi//uvicorn",
    ],
    main = "main.py",
)

# Comprehensive test suite
py_test(
    name = "integration_test",
    srcs = ["test_integration.py"],
    deps = [
        ":web_app",
        "@pypi_test//pytest",
        "@pypi_test//httpx",
        "@pypi_test//pytest_asyncio",
    ],
    data = ["test_data.json"],
)

CI/CD Cache Configuration

Optimize CI/CD with proper cache configuration:

# .github/workflows/build.yml
name: Build and Test

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v4

    - name: Mount bazel cache
      uses: actions/cache@v3
      with:
        path: |
          ~/.cache/bazel
          ~/.cache/bazel-repo
        key: bazel-${{ runner.os }}-${{ hashFiles('WORKSPACE', '**/*.bzl', 'requirements*.txt') }}
        restore-keys: |
          bazel-${{ runner.os }}-

    - name: Configure Bazel
      run: |
        echo "build --repository_cache=/home/runner/.cache/bazel-repo" >> .bazelrc.ci
        echo "build --disk_cache=/home/runner/.cache/bazel" >> .bazelrc.ci
        echo "build --remote_cache=${{ secrets.REMOTE_CACHE_URL }}" >> .bazelrc.ci
        echo "build --remote_upload_local_results=true" >> .bazelrc.ci

    - name: Build
      run: bazel build //... --config=ci

    - name: Test  
      run: bazel test //... --config=ci --test_output=errors

4.8 Troubleshooting Common Issues

Cache Invalidation Problems

Debug cache invalidation issues:

# Check why target was rebuilt
bazel build //target:name --explain=explain.log
bazel build //target:name --verbose_explanations

# Compare action keys
bazel aquery //target:name --output=textproto > action1.txt
# Make change and run again
bazel aquery //target:name --output=textproto > action2.txt
diff action1.txt action2.txt

Dependency Resolution Issues

Debug complex dependency problems:

# Analyze dependency graph
bazel query "deps(//your:target)" --output=graph | dot -Tpng > deps.png

# Find conflicting versions
bazel query "//... intersect deps(@pypi//problematic_package//:*)"

# Check why a dependency was selected
bazel query --output=build //external:pypi_problematic_package

Remote Cache Issues

Troubleshoot remote cache problems:

# Test remote cache connectivity
bazel build //simple:target --remote_cache=your-cache-url --execution_log_json_file=exec.json

# Check cache upload/download
bazel build //target --remote_cache=your-cache-url --experimental_remote_cache_eviction_retries=3 -v

# Verify authentication
curl -H "Authorization: Bearer $TOKEN" https://your-cache-url/status

4.9 Best Practices Summary

Caching Best Practices

  • Use remote caching for team collaboration
  • Separate stable and volatile dependencies
  • Monitor cache hit rates regularly
  • Implement cache warming in CI/CD
  • Use repository caching for external dependencies

Dependency Management Best Practices

  • Pin all dependency versions in lock files
  • Separate production, development, and test dependencies
  • Handle platform-specific dependencies explicitly
  • Use custom resolution for complex scenarios
  • Monitor for security vulnerabilities in dependencies

Performance Optimization

  • Profile builds regularly to identify bottlenecks
  • Structure targets to maximize cache reuse
  • Use hermetic builds for reproducibility
  • Implement incremental build strategies
  • Monitor and optimize resource usage

Module 4 Exercises

Exercise 1: Remote Cache Setup

Set up a remote cache using Google Cloud Storage or AWS S3 and measure the cache hit rate improvement.

Exercise 2: Dependency Conflict Resolution

Create a scenario with conflicting dependency versions and resolve it using custom overrides.

Exercise 3: Cache Performance Analysis

Profile a complex build, identify cache inefficiencies, and implement optimizations.

Exercise 4: CI/CD Cache Integration

Set up cache optimization in a CI/CD pipeline and measure build time improvements.

Next Steps

In Module 5, we’ll cover “Advanced Python Rules and Toolchains” where you’ll learn to:

  • Create custom Python rules and macros
  • Configure multiple Python toolchains
  • Implement hermetic Python builds
  • Use aspects for code analysis

Key Takeaways

  • Bazel’s caching is content-based, not timestamp-based
  • Remote caching enables massive build speedups for teams
  • Proper dependency management prevents version conflicts
  • Cache performance should be monitored and optimized
  • Structured targets maximize cache reuse efficiency
  • Lock files ensure reproducible builds across environments

https://www.linkedin.com/in/sushilbaligar/
https://github.com/sushilbaligar
https://dev.to/sushilbaligar
https://medium.com/@sushilbaligar


This content originally appeared on DEV Community and was authored by Sushil Baligar