This content originally appeared on DEV Community and was authored by Latchu@DevOps
When running workloads on Google Compute Engine (GCE), monitoring and logging are critical to keeping your systems healthy and your applications reliable. Google now recommends using the Ops Agent — a modern, unified solution for collecting logs, metrics, and traces from your VMs.
Let’s break it down.
Why Ops Agent?
Google had legacy agents for logging and monitoring, but:
No new feature development
No support for newer OS versions
Maintenance-only mode
That’s why Ops Agent is the recommended choice for all new workloads. If you’re still running the old agents, it’s time to migrate.
What is Ops Agent?
Ops Agent is a single agent that runs on Compute Engine VMs to:
Collect logs → send to Cloud Logging
Collect metrics & traces → send to Cloud Monitoring
Uses Fluent Bit for logs
Uses OpenTelemetry Collector for metrics & traces
It’s designed for both Linux and Windows VMs, with flexible installation options.
Key Features
Installation & Management
You can deploy Ops Agent in multiple ways:
- Auto-install during VM creation
- Fleet installation using gcloud or automation tools like Ansible, Chef, Puppet, Terraform
- Agent policies via CLI
- Manual install on individual VMs
YAML-based Configuration
- Simple and flexible config files
- Easy customization for log collection, parsing, and filtering
Logging Features
Better performance than the legacy logging agent
Collects logs from:
- System logs (/var/log/syslog, /var/log/messages)
- File-based logs (customizable paths)
- TCP protocol streams
- Forward protocol (Fluent Bit/Fluentd)
Flexible processing:
- Parse unstructured logs into structured JSON
- Regex-based parsing
- Exclude logs with labels/regex
Third-party app support: Apache Kafka, Nginx, Hadoop, MongoDB, MySQL, Redis, Oracle DB, SAP HANA, and more.
Monitoring Features
System metrics out of the box:
- CPU, disk, memory, processes, networking, swap
- GPU (Linux)
- IIS, MSSQL, Pagefile (Windows)
Third-party app integrations (Kafka, Nginx, MariaDB, MongoDB, Redis, WildFly, etc.)
Prometheus metrics collection for apps running on Compute Engine
NVIDIA GPU monitoring with DCGM integration
Final Thoughts
If you’re running workloads on GCE, adopting Ops Agent is a no-brainer:
One agent for both logs & metrics
Actively developed and future-proof
Better performance & third-party support
Flexible deployment at scale
Google has made it clear: transition your workloads to Ops Agent now and unlock better observability for your infrastructure.
Have you already migrated from the legacy agents? What was your experience with Ops Agent so far?
This content originally appeared on DEV Community and was authored by Latchu@DevOps