The Snapmint DevOps team is looking for a Sr. DevOps Engineer with a passion for working on cutting-edge technology and thriving on the challenge of building something new that will operate at massive scale. We are looking for a DevOps Engineer with 3-4 years of hands-on experience in managing modern cloud infrastructure and CI/CD pipelines. The ideal candidate should be well-versed with Kubernetes and cloud services (AWS/GCP), observability tools like Grafana and Loki, and Fluentd, and have strong Linux and scripting and automation skills. You will completely be building and owning one of the areas of DevOps - CI/CD, Scaling microservices and distributed applications using containers and Kubernetes or related technologies, db clusters, Data Lake platform, Centralized logging and monitoring, security
Responsibilities
- Manage and maintain Kubernetes clusters (EKS/GKE) using Karpenter and self-managed node groups.
- Handle deployments and scaling for multi-language applications (React/Node.js, Django, Ruby on Rails).
- Design and maintain CI/CD pipelines using Jenkins and GitHub Actions.
- Automate deployment workflows and manage environment configurations.
- Set up and manage monitoring and alerting using Grafana, CloudWatch, and Prometheus.
- Implement log aggregation using Fluentd, Loki, and Elasticsearch.
- Configure and maintain Sentry for error tracking and OpenTelemetry for distributed tracing.
- Manage NGINX ingress controllers, CoreDNS custom configurations, and Cloudflare WAF/CDN.
- Maintain secure networking across VPCs, NAT Gateways, and subnets in AWS.
- Work with RDS (MySQL/PostgreSQL), including replication, backups, and read-replica optimization for analytics.
- Support analytics platforms like Redash, Metabase, and Jupyter with backend reliability.
- Write scripts in Bash, Python, or Go for automation and infrastructure maintenance tasks.
- Use Docker extensively for containerization and local development environments.
Requirements
- 3-6 years of hands-on experience in a DevOps/Site Reliability role.
- Strong knowledge of Linux, Kubernetes, Docker, and Helm.
- Proficiency with CI/CD tools (e. g., Jenkins, GitHub Actions).
- Experience with observability stack: Grafana, Loki, Fluentd, Prometheus, and Sentry.
- Good understanding of networking fundamentals (DNS, Load Balancing, Firewalls).
- Working knowledge of at least one public cloud (AWS or GCP).
- Familiarity with monitoring distributed applications and log management.
- Basic understanding of application deployment (Node.js, PHP, Django, Ruby on Rails).
- Good understanding of terraform.
- Good understanding of Networking Layers.
This job was posted by Parvinder Kaur from Snapmint.