We are looking for a proactive and detail-oriented DevOps Engineer to drive the scalability, reliability, and evolution of our cloud infrastructure. This role demands strong ownership across platform automation, cloud migration, observability, performance tuning, and team enablement. You will also work closely with our InfoSec team to proactively identify and eliminate infrastructure vulnerabilities, ensuring compliance and security best practices. The ideal candidate has experience working on high-scale platforms that serve millions of users or process large volumes of real-time transactions with strict uptime and latency requirements.
Responsibilities
- Proven ability to manage infrastructure for mission-critical, high-volume platforms processing thousands of transactions per second, with tight latency requirements and 99.9999% uptime SLAs.
- Own and operate infrastructure in Azure across environments (Dev, Staging, Production, DR).
- Lead cloud-to-cloud migrations (e. g., Azure to GCP).
- Manage CI/CD pipelines, versioned deployments, and environment isolation.
- Drive migration from VM/VMSS to Kubernetes (AKS) with minimal disruption.
- Set up and manage an observability stack, including Azure Monitor, Log Analytics, Datadog, Sentry, and proactive alerting.
- Monitor and triage Azure alerts; ensure fast incident response and resolution.
- Track and implement Azure platform updates/upgrades (not limited to AGW, LB, AKS, VMs, Storage, DBs) with minimal downtime.
- Plan and execute regular infrastructure patching and maintenance across environments.
- Assist Development, Support, and QA teams with secure and reliable access to Azure VMs, containers, and cloud databases.
- Optimize and implement backups and disaster recovery processes.
- Monitor and tune database performance (PostgreSQL, MySQL, Mongo).
- Drive infrastructure cost optimization and budget monitoring.
- Ensure cloud security, compliance, and governance practices are in place.
- Configure and manage Cloudflare for DNS, CDN, WAF, and edge security integration.
- Set up and maintain Azure Traffic Manager for global traffic routing and failover.
- Manage Azure Application Gateway for layer 7 load balancing, SSL termination, and WAF policies.
- Manage Kubernetes internals, including pods, replica sets, deployments, services, ingress controllers, and autoscaling.
- Implement and manage caching layers (e. g., Redis) and data indexing with Elasticsearch for performance and observability.
Requirements
- Hands-on experience in cloud-to-cloud migrations.
- Solid understanding of observability principles (OpenTelemetry, Prometheus, Grafana, ELK, Azure Monitor, etc. )
- Experience with Kubernetes, containerization (Docker), and Helm.
- In-depth experience with Kubernetes pods, replicas, ingress, HPA (horizontal pod autoscaler), and cluster autoscaling.
- Experience with Elasticsearch for observability and data indexing.
- Knowledge of caching strategies using Redis, Memcached, or equivalent.
- Proficiency in Infrastructure as Code (Terraform, Bicep, Ansible).
- Strong scripting skills (Bash, Python, PowerShell).
- Proficient in setting up CI/CD pipelines (GitHub Actions, Azure DevOps, Jenkins).
- Expertise in database performance tuning - query optimization, indexing, and connection pooling.
- Familiarity with Azure Advisor, Service Health, and resource upgrade planning.
- Experience supporting engineering teams with VPN, firewall rules, and database/VNet connectivity.
- Hands-on experience with Cloudflare (DNS, CDN, WAF, edge security).
- Strong understanding of Azure Traffic Manager and Application Gateway.
Strong Experience WithAzure, Including
- Compute: VMs, VMSS, AKS, App Services.
- Networking: Application Gateway, Azure Traffic Manager, Load Balancer, NSGs, Private Link.
- Storage: Blob Storage, File Shares, Managed Disks, Azure Backup.
- Monitoring: Azure Monitor, Log Analytics, Alerts, Application Insights.
- Cost Optimization: Cost Management, Budgets, Azure Advisor.
- Security: Key Vault, Defender for Cloud, RBAC, Azure Policies.
- Deployment & IaC: Bicep, ARM templates, Azure DevOps.
Nice-to-Have
- Certifications in Azure, Kubernetes.
- Knowledge of retail technology stacks.
This job was posted by Ritika Uttmani from Mishipay.