Our client is building the engine that powers world-class software delivery — and we need a Director – DevOps to drive it.
This is a rare opportunity to help shape and build the developer platform: cloud-native infrastructure, deployment pipelines, GitOps workflows, and the observability systems that keep our engineers fast and our software reliable.
You'll be shaping how hundreds of engineers build, ship, and operate software across a modern, multi-environment Kubernetes estate on Azure — in a regulated environment where quality and security aren't optional.
But this role is bigger than the platform. You'll be on the ground floor of something genuinely new: helping Our client establish Hyderabad as a world-class engineering hub. That means GCC bootstrapping, building a developer ecosystem that spans geographies, and making critical early decisions that will define how we operate globally for years to come.
We're looking for a leader who can operate in both spheres — technically deep enough to make the right architectural calls, and people-focused enough to build and grow high-performing teams. If you want to make a profound difference in how engineers work — and through them, in the lives of the financial advisors and investors we serve — this is the role.
Key Responsibilities
Stakeholder Management & Team Development
- Cross-Functional Collaboration: Act as the primary liaison and technical partner, collaborating closely with Product Managers to define requirements, Software Engineering teams to ensure data service interoperability, and Senior Business Stakeholders to align data initiatives with commercial priorities.
- Talent & Mentorship: Build, mentor, and lead a high-performing team of developer experience engineers and architects, fostering a culture of ownership, technical rigor, and continuous learning. Guide career development and succession planning within the data engineering function.
CI/CD & GitOps Pipeline Management
- Design, build, and maintain CI/CD pipelines using Azure DevOps Pipelines and/or GitHub Actions
- Implement and manage GitOps-driven multi-environment deployment workflows using ArgoCD and Helm, enforcing environment-specific configurations across dev, staging, and production clusters
- Maintain Helm charts, values files, and environment overlays for containerized application deployments
- Ensure deployment repeatability, traceability, and rollback capabilities across all environments
Kubernetes & Container Platform Operations
- Manage Azure Kubernetes Service (AKS) clusters, including node pools, namespaces, resource policies, limits, and constraints
- Configure and manage RBAC, network policies, pod security policies, and admission controllers
- Implement and tune KEDA-based autoscaling and horizontal pod autoscaling (HPA) strategies
- Manage service mesh configurations (e.g., Istio or Linkerd) for traffic management, mTLS, and observability
- Oversee container image lifecycle: build pipelines, image registries, and Docker image security scanning using tools such as Trivy, Azure Defender for Containers, or Snyk
Infrastructure as Code & Configuration Management
- Write and maintain Terraform modules for provisioning Azure infrastructure (AKS, networking, storage, databases, Key Vault, etc.)
- Use Ansible and configuration management tools for system configuration and drift remediation
- Manage secrets securely using Azure Key Vault and/or HashiCorp Vault, integrating with Kubernetes
- Enforce infrastructure policies to enable automated remediation, compliance reporting, and security guardrails, using tools such as OPA/Gatekeeper or Azure Policy
Observability, Monitoring & Incident Response
- Configure alerting pipelines using Prometheus Alert manager and Pager Duty for timely incident response
- Manage centralized logging with Azure Log Analytics / Application Insights. Splunk experience a plus
- Implement and exercise distributed tracing using OpenTelemetry integrated with Jaeger and Azure Application Insights
- Participate in on-call rotation; lead incident response, RCA documentation, and post-mortem processes
Security & Compliance
- Enforce security best practices including image scanning, runtime threat detection, secrets rotation, and least-privilege access
- Collaborate with Security teams to integrate security scanning tools (Azure Security Center, Arnica) into CI/CD pipelines
- Maintain compliance with financial services regulatory requirements across all infrastructure components
Collaboration & Automation
- Partner closely with development, SRE, and platform engineering teams to support smooth deployments and resolve infrastructure blockers
- Write automation scripts in Bash, PowerShell, or Python to reduce toil and improve operational efficiency
- Contribute to runbooks, architecture documentation, and internal knowledge bases
- Evaluate and adopt new tools and technologies to improve the DevOps posture of the organization
Required Skills & Qualifications
- Bachelor's degree in computer science, Engineering, or a related technical field (or equivalent practical experience)
- 14+ years of experience in DevOps, Site Reliability Engineering, or infrastructure engineering roles
- Hands-on experience with majority of tools listed in the above Key Responsibilities section
- Leadership: Demonstrated success in setting coding standards, leading technical design reviews, and driving technical consensus across multiple development teams.
- Excellent communication and cross-functional collaboration skills
Preferred Qualifications
- Experience with service mesh technologies (Istio, Linkerd)
- Familiarity with Azure Logic Apps, Azure Functions, or serverless computing patterns
- Experience managing RabbitMQ, Kafka, or similar event-driven messaging systems
- Knowledge of SQL Server, PostgreSQL, Redis, or MongoDB operations in containerized environments
- Exposure to Ansible for configuration management
- Experience in financial services or other regulated industry environments
- Microsoft Certified: Azure Administrator, Azure DevOps Engineer Expert, or Kubernetes (CKA/CKAD) certifications
- Experience with Open Telemetry, Jaeger, or distributed tracing platforms