Part-19: GCE Ops Agent: Logging & Monitoring in Google Cloud Platform (GCP)
When running workloads on Google Compute Engine (GCE), monitoring and logging are critical to keeping your systems healthy and your applications reliable. Google now recommends using the Ops Agent โ a modern, unified solution for collecting logs, metrics, and traces from your VMs.
Letโs break it down. ๐
Why Ops Agent?
Google had legacy agents for logging and monitoring, but:
- โ No new feature development
- โ No support for newer OS versions
- โ ๏ธ Maintenance-only mode
Thatโs why Ops Agent is the recommended choice for all new workloads. If youโre still running the old agents, itโs time to migrate.
What is Ops Agent?
Ops Agent is a single agent that runs on Compute Engine VMs to:
- ๐ Collect logs โ send to Cloud Logging
- ๐ Collect metrics & traces โ send to Cloud Monitoring
- ๐ Uses Fluent Bit for logs
- ๐ Uses OpenTelemetry Collector for metrics & traces
Itโs designed for both Linux and Windows VMs, with flexible installation options.
Key Features
๐ง Installation & Management
You can deploy Ops Agent in multiple ways:
- Auto-install during VM creation
- Fleet installation using gcloud or automation tools like Ansible, Chef, Puppet, Terraform
- Agent policies via CLI
- Manual install on individual VMs
๐ YAML-based Configuration
- Simple and flexible config files
- Easy customization for log collection, parsing, and filtering
Logging Features
๐ Better performance than the legacy logging agent
๐ Collects logs from:
- System logs (/var/log/syslog, /var/log/messages)
- File-based logs (customizable paths)
- TCP protocol streams
- Forward protocol (Fluent Bit/Fluentd)
๐ Flexible processing:
- Parse unstructured logs into structured JSON
- Regex-based parsing
- Exclude logs with labels/regex
๐ Third-party app support: Apache Kafka, Nginx, Hadoop, MongoDB, MySQL, Redis, Oracle DB, SAP HANA, and more.
Monitoring Features
๐ System metrics out of the box:
- CPU, disk, memory, processes, networking, swap
- GPU (Linux)
- IIS, MSSQL, Pagefile (Windows)
๐ Third-party app integrations (Kafka, Nginx, MariaDB, MongoDB, Redis, WildFly, etc.)
๐ก Prometheus metrics collection for apps running on Compute Engine
๐ฎ NVIDIA GPU monitoring with DCGM integration
Final Thoughts
If youโre running workloads on GCE, adopting Ops Agent is a no-brainer:
โ
One agent for both logs & metrics
โ
Actively developed and future-proof
โ
Better performance & third-party support
โ
Flexible deployment at scale
Google has made it clear: transition your workloads to Ops Agent now and unlock better observability for your infrastructure.
๐ Have you already migrated from the legacy agents? What was your experience with Ops Agent so far?