Search by job, company or skills

DevOn

Lead CloudOps Engineer

This job is no longer accepting applications

new job description bg glownew job description bg glownew job description bg svg
  • Posted 2 months ago

Job Description

Lead Operations Engineer

Experience: 8+ years

  • Own operational oversight for services running on a Java-based microservices platform. Act as the primary escalation point for production incidents; lead incident response and communication.
  • Drive post-incident reviews (blameless RCAs) and embed learnings through preventive actions. Maintain service dashboards, alerts, and incident tooling (e.g., PagerDuty, Datadog).

Technical Leadership

  • Guide operational practices across services built using Java (Spring Boot), Kafka, MongoDB and related technologies.
  • Oversee monitoring, observability, and performance tuning using Datadog, ELK, Prometheus, or similar tooling.

Problem Management & Root Cause Elimination

  • Lead proactive and reactive problem management efforts. Identify recurring production issues and collaborate with engineering to design permanent solutions.
  • Track and reduce operational toil via automation and tooling improvements.

Change Enablement & Service Onboarding

  • Partnerwith development teams to onboard new services with production readiness standards.
  • Ensure all services meet requirements for monitoring, logging, documentation, support, and resilience before go-live.
  • Support safe, rapid change practices including canary releases, feature flags, and progressive delivery.

Team Management & Leadership

  • Lead and mentor a team of operations engineers and/or SREs.
  • Manage performance reviews, career development, and day-to-day team workload.
  • Foster a high-performance culture with strong accountability, collaboration, and a learning mindset.

Continuous Improvement & DevOps Practices

  • Drive automation and self-service initiatives to reduce manual intervention and operational burden.
  • Champion observability best practices (metrics, traces, logs) and error budget tracking. Promote DevOps culture and continuous feedback loops between engineering and operations.

Governance, Risk & Compliance

  • Ensure operational processes comply with security, privacy, and regulatory requirements (e.g., SOC 2, ISO 27001). Manage operational risks, service continuity plans, and audit readiness.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 127551089