We are looking for a high-caliber DevOps / SRE / Platform Engineer who brings strong engineering fundamentals and a deep sense of ownership. You will design, build, and operate the platforms and infrastructure that enable product teams to ship quickly and safely - at scale.
What You Will Do
Platform & Cloud Engineering
- Design and operate cloud-native platforms that abstract complexity and improve developer productivity.
- Build reusable, opinionated infrastructure using Infrastructure as Code.
- Own Kubernetes clusters, networking, service orchestration, and workload reliability.
Reliability Engineering
- Define and drive SLIs, SLOs, and error budgets for business-critical services.
- Participate in on-call rotations, lead incident response, and write clear, blameless RCAs.
- Continuously reduce operational toil through automation and engineering solutions.
CI/CD & DevOps
- Build and evolve secure, automated CI/CD pipelines using GitOps principles.
- Enable safe, frequent production deployments with strong rollback and observability.
- Partner with application teams to embed reliability and operational excellence early in the lifecycle.
Observability & Operations
- Implement best-in-class logging, metrics, tracing, and alerting.
- Ensure alerts are actionable and aligned with service health - not noise.
- Build dashboards, runbooks, and self-healing mechanisms to improve MTTR.
Architecture & Collaboration
- Work closely with software engineers, architects, and security teams to influence system design.
- Review infrastructure and architecture through the lens of scale, resilience, and cost efficiency.
- Champion DevOps, SRE, and cloud-native best practices across the organization.
What We're Looking For
Core Engineering
- Strong foundations in Linux, networking (DNS, TCP/IP), and distributed systems.
- Proficiency in Python or Go for automation and tooling.
- Clean Git practices and strong software engineering discipline.
Cloud & Containers
- Hands-on experience with AWS (primary), exposure to GCP or Azure is a plus.
- Strong experience operating Kubernetes in production environments (required).
- Experience with Helm and containerized workloads at scale (required).
Infrastructure & Tooling
- Infrastructure as Code using Terraform (required).
- Configuration and automation using Chef/Ansible (preferred).
- CI/CD & GitOps: ArgoCD, GitHub Actions / Jenkins / GitLab CI.
Observability & Reliability
- Metrics and alerting: Prometheus, Grafana, Alertmanager.
- Tracing/APM: Datadog, New Relic, OpenTelemetry.
- Incident management experience (PagerDuty or equivalent).
Data & Messaging (Working Knowledge)
- Datastores: PostgreSQL, MySQL, MongoDB.
- Streaming & search: Kafka, Elasticsearch.
- Caching: Redis.
Educational Background
- Bachelor's or Master's degree in Computer Science, Information Technology, Engineering, or a related technical field from a recognized institution.
- Strong grounding in operating systems, computer networks, distributed systems, databases, and software engineering principles.
- Formal coursework or academic projects involving cloud computing, containerization, Linux systems, networking, or automation are highly valued.
- Equivalent practical experience in building and operating large-scale, production-grade platforms may be considered in lieu of formal education.
- Relevant certifications such as AWS Certified Solutions Architect / DevOps Engineer, CKA/CKAD, or Terraform certifications are a plus, but not mandatory.
Nice to Have
- Experience building internal developer platforms.
- Exposure to DevSecOps and cloud security.
- Experience operating high-traffic, consumer-facing systems.
- Strong written communication (design docs, RFCs, runbooks).