
Search by job, company or skills
We are looking for an Observability specialist to lead the design and implementation of our telemetry pipeline using OpenTelemetry (OTel) Collectors to monitor our GCP infrastructure. You won't just turn on monitoring; you will curate a high-signal environment by identifying value-add metrics and implementing sophisticated label enrichment strategies to ensure our data is actionable, cost-effective, and context-rich.
● OTel Collector Architecture: Design, deploy, and maintain OTel Collectors (Sidecars, DaemonSets, and Gateway clusters) across GKE and GCE environments.
Pipeline Optimization: Configure receivers (Google Cloud Monitoring, OTLP, Host Metrics), processors (Batch, Memory Limiter, Resource Detection), and exporters (GoogleCloud, Prometheus).
● GCP Metric Curation: Distinguish between noise and signal by identifying and collecting high-value GCP metrics (e.g., compute.googleapis.com/instance/cpu/scheduler_wait_time vs. simple utilization).
● Metadata & Label Enrichment: Use the Resource Detection Processor and Transform Processor to automatically inject GCP-specific metadata (Project ID, Zone, Instance ID, Custom Labels) into all telemetry signals.
● Cost Management: Implement filtering and dropped-label strategies to manage cardinality explosions and optimize Google Cloud Observability (Stackdriver) costs.
● GCP Expertise: Deep understanding of GCP resource hierarchies and the Cloud Monitoring API (v3).
● OpenTelemetry Proficiency: Advanced configuration of the otel-collector-contrib distribution, specifically for infrastructure monitoring.
● Metric Strategy: Knowledge of which Golden Signals (Latency, Errors, Saturation, Traffic) are most relevant for specific GCP services like Cloud SQL, GKE, and Pub/Sub.
● Contextual Enrichment: Experience using OTel to bridge the gap between infra metrics and application context (e.g., mapping a GCE instance ID to a specific Business Unit via labels).
● Infrastructure as Code: Proficiency in Terraform or Helm for deploying observability as a standard part of the landing zone.
We're mostly looking for a Data Curator rather than just a Systems Admin. Most people can install a collector; very few know how to make the data coming out of it actually useful and cost-efficient.
We need an Observability Engineer who acts like a filter. They should know which specific GCP metrics actually matter (so we don't drown in noise) and how to tag (enrich) those metrics using OpenTelemetry so that when an alert goes off, we know exactly which team, project, and environment it belongs to.
Keep an eye out for these on a resume or listen for them in a screening:
1. The Optimizer: They talk about Cost Management or Cardinality.
. Why: Storing every single metric in GCP is expensive. A good candidate mentions filtering or dropping useless metrics to save money.
2. The Context King: They mention Resource Detection or Attribute Mapping.
. Why: This is the label enrichment part. It means they know how to automatically attach metadata (like Owner:
Payments-Team) to a raw metric.
3. The Contribution Pro: They mention using the Contrib version of the OTel Collector.
. Why: The standard OTel collector is basic; the Contrib version contains the specific Google Cloud processors needed for high-level monitoring.
.
● OpenTelemetry (OTel) Collector Contrib: This is the primary engine where the magic happens; it contains the specific processors required to automatically enrich metrics with GCP metadata and transform raw data into businessready signals.
● Google Cloud Observability (Stackdriver): The candidate must deeply understand the destination's proprietary data model and billing logic to ensure that the metrics they collect are formatted correctly and don't cause a cardinality explosion that spikes your monthly bill.
● Kubernetes (GKE) & Helm: Since the collector typically runs as a DaemonSet or Sidecar in GCP, mastery here ensures the monitoring pipeline is resilient, scales with your clusters, and is easily updated via standardized charts.
● Terraform / Terragrunt: High-quality observability must be codified rather than manual; using IaC ensures that every new GCP project or resource is automatically onboarded with the correct labels and alert policies from day one.
● PromQL & MQL (Query Languages): Collecting the data is only half the battle; the candidate needs these languages to build the complex dashboards and SLO (Service Level Objective) ratios that actually prove the value-add to your engineering teams.
Artech is the largest Women & Minority owned IT staffing firm in the US, with US$ 800 million annual revenue run rate in 2021 and a footprint across the globe. With nearly three decades of experience, Artech empowers businesses through applied human intelligence and offers a spectrum of services that include Workforce Solutions (Contingent Staffing, Bulk/ Project Staffing, Master Vendor, RPO, Direct Hire and Payroll Transition) and Project-Based Solutions (Digital Experience, Technical Operations, Technical Development, Business Operations & Digital Platforms). Artech works with over 90 Fortune 500 clients across USA, Canada, India, and China.
At Artech, we are empowering talent by connecting potential with opportunities through applied human intelligence. We empower our teams to maximize the impact of their intellect, through a performance oriented, diverse, flexible, and inclusive work environment supported by our continuous learning and development focus.
Job ID: 147027071