We are looking for a DevOps Lead Engineer to design, build, and scale our cloud infrastructure and CI/CD ecosystem. The ideal candidate should have strong AWS expertise (including Control Tower, Step Functions, and Lambda) along with automation and integration skills in Python. Exposure to GenAI-based automation or ML pipeline deployment is a strong plus.
Responsibilities
- Lead and own the DevOps strategy, architecture, and implementation across environments (Dev, QA, Prod).
- Design and manage AWS environments using AWS Control Tower, Organizations, Service Catalog, and Step Functions for orchestration and governance.
- Implement and optimize CI/CD pipelines using tools like GitHub Actions, Jenkins, or Azure DevOps.
- Automate provisioning and configuration using Terraform / CloudFormation.
- Integrate Python-based automation scripts for deployments, monitoring, and cloud operations.
- Collaborate with Data and AI teams to deploy GenAI models and APIs securely and efficiently.
- Manage containerization and orchestration using Docker and Kubernetes (EKS preferred).
- Ensure security, compliance, and cost optimization across multi-account AWS setups.
- Drive observability using CloudWatch, Grafana, or Prometheus for proactive monitoring and alerting.
- Mentor and guide junior DevOps engineers; establish best practices for infrastructure-as-code, branching, and release management.
Requirements
- 7+ years in DevOps, Cloud, or Platform Engineering roles.
- Strong hands-on experience with AWS services: EC2 Lambda, S3 CloudFormation, IAM, Control Tower, Step Functions, and CloudWatch.
- Proven experience with Terraform, Python, and CI/CD tools (GitHub Actions, Jenkins, etc. ).
- Knowledge of Kubernetes / EKS and container orchestration best practices.
- Familiarity with ML/GenAI pipelines or deployment of AI APIs in production.
- Strong grasp of networking, security groups, IAM policies, and least-privilege access design.
- Excellent problem-solving and communication skills.
Nice-to-Have
- Experience with Databricks, SageMaker, or AI model deployment frameworks.
- Exposure to Azure or GCP cloud environments.
- Scripting knowledge in Bash or Go in addition to Python.
- Familiarity with monitoring and cost optimization tools like CloudHealth, Datadog, or New Relic.
This job was posted by Chetna Joshi from UsefulBI.