Job Summary
We are seeking aDevOps Engineerwith strong experience to design, implement, and maintain CI/CD pipelines and cloud infrastructure for application, data, and AI workloads. The role will also support basicMLOpsandLLMOpspractices, enabling reliable deployment and monitoring of machine learning models and large language model (LLM)based services in collaboration with data science and AI teams.
Key Responsibilities
- Design, implement, and maintain CI/CD pipelines using Azure DevOps (Repos, Pipelines, Artifacts, Releases) for multiple applications and services.
- Manage and monitor Azure/Snowflake resources (VMs, App Services, AKS, storage, networking) for performance, reliability, security, and cost optimization.
- Implement logging, monitoring, and alerting using Azure Monitor, Application Insights, and related observability tools.
- Collaborate with development teams and Client to streamline build, deployment, and release processes following DevOps and Agile practices.
- Troubleshoot and resolve issues in build, deployment, and production environments, including incident response and root cause analysis.
- Maintain documentation for environments, pipelines, and standard operating procedures especially for Client projects.
- Collaborate with data scientists and ML engineers to integrate machine learning workflows into existing CI/CD pipelines.
- Work with AI/LLM engineers to operationalize LLM-based applications and APIs, including environment setup, deployment, and runtime monitoring.
- Configure and manage infrastructure and integrations required for LLM workloads and use tools like MLflow or TruLens to increase observability and tracking.
Must Have Skills
Technical Proficiency:
- 34 years of hands-on experience as a DevOps / Cloud Engineer with strong focus onAzureservices or Snowflake (compute, networking, storage, PaaS).
- Solid experience building and maintaining CI/CD pipelines, preferably using Azure DevOps (or similar tools like Jenkins/GitHub Actions).
- Proficiency with Infrastructure-as-Code (ARM/Bicep/Terraform) and scripting (PowerShell or Bash) for automation.
- Good understanding of containerization and orchestration concepts (Docker, Kubernetes/AKS) in production or pre-production environments.
- Strong knowledge of Git-based workflows and branching strategies.
- Basic understanding of ML lifecycle concepts (training, validation, deployment, monitoring) and how they fit into CI/CD and environment promotion.
- Familiarity with deploying and monitoring API-based AI/LLM services (e.g., Azure OpenAI, OpenAI, other LLM providers) is a strong plus.
- Strong problem-solving and debugging skills across infrastructure, pipelines, and runtime services.
- Ability to work closely with developers, data scientists, and stakeholders, and communicate clearly about deployment and release status.
Good to Have Skills
- Exposure toAWSservices and tooling for multi-cloud or migration scenarios.
- Experience integrating or automating deployments forSnowflakeor other data platforms (e.g., running migration scripts, managing secrets, scheduling jobs).!
- Experience with advanced MLOps tooling (e.g., MLflow, feature stores, model registries) and LLMOps practices (prompt management, vector databases, RAG pipelines) is a plus.
- Experience with security best practices (identity and access management, secrets management such as Key Vault, network security, compliance checks) in cloud environments.
- Familiarity with observability stacks (Prometheus, Grafana, ELK) in addition to Azure-native monitoring tools.
- Experience working in Agile/Scrum teams and using work management tools (Azure Boards/Jira).
Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field; or equivalent practical experience in DevOps / Cloud Engineering.
- Any relevant Cloud certifications (e.g., Azure Administrator, Azure DevOps Engineer Expert, Azure Solutions Architect, or Azure Data/AI certificationsto design, implement, and maintain CI/CD pipelines and cloud infrastructure. You will enable development teams to deploy applications reliably, securely, and repeatably in production environments while optimizing for cost and performance. This role requires hands-on technical depth, problem-solving mindset, and collaboration with cross-functional teams in an Agile environment. Would like the candidate to have MLOps and LLMOps engineering experience as well.