About the Role
We are looking for a Senior DevOps Engineer to own and operate our entire production infrastructure on AWS.You will be responsible for end-to-end cloud infrastructure management, CI/CD, observability, security, reliability, and cost optimization.
This role also includes strong MLOps responsibilities, ensuring reliable operation of ML and data pipelines, reproducible deployments, and stable batch or near-real-time workloads.
Key Responsibilitie
s1. AWS Infrastructure Ownershi
- pManage AWS environments (prod, staging, dev) including networking (VPCs, subnets, routing), IAM, and security groups
- .Operate compute workloads (EC2/ECS), autoscaling, and service health
- .Own PostgreSQL (RDS): performance tuning, backups, disaster recovery, upgrades, and optimization
- .Manage Redis and S3 (lifecycle policies, encryption, retention)
- .Define infrastructure standards (naming, tagging, access control)
.2. CI/CD & Deploymen
- tBuild and maintain CI/CD pipelines across services
- .Implement safe deployment strategies (blue/green, canary, rollback)
- .Maintain Infrastructure-as-Code (Terraform preferred)
.3. Observability & Reliabilit
- yOwn monitoring (metrics, logs, traces), alerting, and dashboards
- .Define SLIs/SLOs and drive incident response and postmortems
- .Proactively improve system performance and reliability
.4. Security & Governanc
- eImplement secure networking, secrets management, IAM best practices, vulnerability scanning, and patching
- .Enforce secure SDLC and production access controls
.5. MLOps & Data Pipeline Operation
- sOperate workflow orchestration systems (Airflow preferred)
- .Manage batch and inference workloads efficiently and reliably
- .Support model/artifact versioning and reproducibility
- .Monitor ML/data workloads and ensure operational stability
.6. Cost Optimizatio
- nMonitor and optimize AWS costs (right-sizing, autoscaling, storage lifecycle, spot/reserved usage)
- .Establish cost visibility, budgets, and tagging strategy
.Required Qualification
- s67 years of experience in DevOps / Platform Engineering / SRE with production ownership
- .Strong hands-on experience with AWS (IAM, VPC, EC2/ECS, RDS, S3, CloudWatch, load balancers, autoscaling)
- .Experience building CI/CD pipelines (GitHub Actions, Jenkins, or GitLab CI) and deployment automation
- .Production experience with Terraform or similar IaC tools
- .Observability experience across metrics/logs/traces + incident response ownership
- .Solid Linux and networking fundamentals
- .Security best practices (least privilege, secrets management, vulnerability management)
- .Experience operating ML/data pipelines in production
.Nice to Hav
- eKubernetes/EKS experience
- .Kafka operations experience
- .Experience with large-scale batch data processing or ML platforms
- .Familiarity with agentic/LLM tool ecosystems
.