Title : Cloud & DevOps Engineer (Infrastructure Platform)
Work Location : Bangalore
Job Type : Full time
Work Type: Onsite-Monda-Friday
Shift: UK Shift - 1:30 PM to 10:30 PM IST
Overview:
TekWissen is a global workforce management provider throughout India and many other countries in the world.
Job Description:
- We are seeking a Cloud, DevOps & MLOps Engineer with strong hands-on experience in cloud infrastructure, automation, CI/CD, container platforms, and machine learning platform operations.
- This role requires professionals who can own cloud environments end-to-end while also supporting AI/ML workloads, model deployment pipelines, and scalable AI infrastructure.
- The ideal candidate brings practical production experience in DevOps practices and ML platform enablement, strong troubleshooting skills, and the ability to improve operational maturity across cloud, DevOps, and MLOps practices.
- The role involves collaboration with data scientists, ML engineers, and application teams to enable scalable and reliable AI-powered solutions.
Key Responsibilities:
Cloud Infrastructure Ownership:
- Design, provision, and manage infrastructure workloads across AWS, Azure, or GCP environments
- Own lifecycle management of compute, networking, storage, and platform services
- Support infrastructure required for AI/ML training, inference, and data pipelines
- Manage compute environments including GPU/accelerated workloads for machine learning
- Ensure infrastructure availability, scalability, and operational stability
- Implement infrastructure standards, templates, and reusable deployment patterns
Infrastructure as Code & Automation
- Develop and maintain infrastructure using Terraform or similar IaC tools
- Automate provisioning of environments for data science and ML experimentation
- Automate provisioning, configuration, and deployment workflows
CI/CD & Release Enablement
- Design and maintain robust CI/CD pipelines using GitHub Actions, GitLab CI, Azure DevOps, or Jenkins
- Enable ML model CI/CD pipelines (MLOps) for model versioning, validation, and deployment
- Automate build, test, security scan, and deployment pipelines for both applications and ML models
- Enable automated build, test, security scan, and deployment pipelines
Containerization & Kubernetes
- Build, deploy, and manage containerized applications using Docker
- Support Kubernetes clusters for microservices and ML inference workloads
- Manage scalable deployment of AI model APIs
ML Platform Support
- Support infrastructure for machine learning workflows and model lifecycle
- Enable model training, experiment tracking, and model deployment pipelines
- Collaborate with data scientists and ML engineers to operationalize models
- Support frameworks such as: (MLFlow , Kubeflow, Azure ML, SageMaker)
System Administration & Platform Reliability
- Manage Linux / Windows server environments including patching, performance tuning, and security hardening
- Support high availability environments for AI applications and data pipelines
- Participate in incident response, root cause analysis, and resolution activities
- Improve monitoring, alerting, and operational readiness practices
- Maintain documentation for infrastructure and operational runbooks
Security & Access Management
- Implement IAM policies, RBAC controls, and secure access models
- Secure ML pipelines and data access
- Ensure secure handling of secrets, certificates, and credentials
Required Qualifications:
- Bachelor's degree in Computer Science, Engineering, or related field
- 6-12 years of experience in Cloud Engineering, DevOps, Infrastructure Engineering, or Platform Support roles
- Strong hands-on experience with at least one public cloud (AWS / Azure / GCP)
- Proven experience implementing Infrastructure as Code using Terraform
- Experience building and maintaining CI/CD pipelines
- Hands-on exposure to Docker and Kubernetes environments
- Strong scripting skills (Bash / Python / PowerShell)
- Understanding of cloud infrastructure for AI workloads
Preferred Experience
- Experience supporting multi-region or multi-environment cloud deployments
- Exposure to cloud monitoring tools such as CloudWatch, Azure Monitor, Prometheus, Grafana
- Understanding of model deployment pipeline
- Experience with vector databases or AI workloads
- Understanding of cost optimization and cloud governance practices
- Experience working in global delivery or production support environments
- Exposure to platform engineering or SRE practices
Certifications (Preferred)
- AWS Associate / Azure Administrator / GCP Associate Cloud Engineer
- Terraform Associate Certification
- Kubernetes and Cloud Native Associate (KCNA) or CKA
- CompTIA Security+
- Linux Foundation Certification (LFCS / LFCE)
Key Competencies:
- Strong ownership mindset and execution discipline
- Ability to troubleshoot complex infrastructure issues
- Structured thinking and documentation capability
- Collaboration with distributed global teams
- Continuous learning and improvement mindset
Work Environment:
- Structured office-based engineering collaboration
- Exposure to AI platforms, ML pipelines, and production AI deployments
- Participation in incident troubleshooting and operational reviews
- Adherence to enterprise security and compliance standards
TekWissen Group is an equal opportunity employer supporting workforce diversity.