- Design, deploy, and manage scalable infrastructure across cloud and hybrid environments (AWS, Azure, GCP, Yotta).
- Implement and maintain continuous integration and delivery (CI/CD) pipelines.
- Manage containerization and orchestration using Docker and Kubernetes.
- . Ensure security, compliance, and vulnerability management (SOC2, GDPR, ISO27001).
- Oversee system monitoring, logging, alerting, and incident response.
- Optimize infrastructure performance and cost-efficiency.
- Implement disaster recovery, backup strategies, and business continuity planning.
- Manage cloud cost, budgeting, and forecasting.
- Mentor and provide leadership to DevOps and infrastructure team members.
- Collaborate cross-functionally with developers, QA, product management, and senior stakeholders.
- Document infrastructure architecture, configurations, processes, and maintain operational reporting.
Good to have skills:
- Certifications in AWS, Azure, GCP, Kubernetes (CKA/CKAD)
- Experience in GPU-intensive workloads and AI/ML infrastructure
- Proficiency in Infrastructure as Code (IaC) tools like Terraform, CloudFormation
- Familiarity with MLOps practices and AI model deployment.
Qualification and Experience:
- Bachelor's/Master's degree in Computer Science, IT, or related fields
- Minimum 15+ years of experience in infrastructure management and DevOps
- Extensive experience with cloud platforms, CI/CD tools, and Kubernetes
- Proven experience in infrastructure security and compliance frameworks
- Strong scripting and automation skills (Bash, Python, Ansible)