Your IT Future, Delivered
Principal Infrastructure Engineer.
With a global team of around 5600 IT experts, DHL IT Services connects people and keeps the world economy running by continuously developing innovations and creating sustainable digital solutions. We work across global boundaries and push limits in all dimensions of logistics. You can leave your mark and help shape the technological backbone of the world's largest logistics company.
Digitalization. Simply delivered.
In the P&P Strategy & Enabling Services area, we develop applications for business customers. Reliability, performance, and unique, user-friendly experiences are especially important to us. Our applications cover services such as letter announcements, automated personalized print mailings, and letter tracking.
#DHL #DHLITServices
Role Overview
As Principal Infrastructure Engineer, you are responsible for leading a team of Cloud Platform Engineers and driving the design, build, and operation of a secure, scalable, and reliable Cloud Platform. You combine strong technical expertise with people leadership to enable product and development teams through standardized, automated, and compliant cloud services.
You act as a technical role model, coach your team, and collaborate closely with architecture, security, DevOps, and application teams.
Grow together
We rely on DevOps and work closely with application, product, and DevOps teams as an enabler. Our way of working is shaped by agile methods in interdisciplinary teams. We remain curious and open to new technologies and best practices, placing high value on continuous learning.
What You Can Achieve with Us:
- Opportunity to own architectural strategy for mission‑critical systems.
- A dynamic and innovative engineering environment.
- Professional development opportunities, including certifications and training.
- Collaborative culture with strong emphasis on continuous improvement.
- Translate business and non-functional requirements into platform capabilities, while continuously developing yourself and leading your own engineering team.
- Contribute to adherence to our methodological (agile) guidelines. Together with the team, you will develop delivery plans and ensure their implementation.
- Support stakeholders in planning and implementing their backlog, keeping the product roadmap in view together with the Product Owner.
- Actively support the team in removing obstacles.
- Analyze existing operational standards, processes, and governance to identify and implement improvements.
Ready to embark on the journey
What You Should Bring:
- 5+ years experience in the ecosystem of Cloud Native Computing Foundation (CNCF)
- 5+ years experience in cloud platform engineering or SRE roles
- Expert-level Terraform - modular architecture, Terraform Cloud, state management, provider development
- Deep Azure knowledge - AKS, Key Vault, Storage, PostgreSQL, Networking, RBAC, Service Principals
- Kubernetes expertise - Helm, operators, policy engines (Kyverno/OPA), ingress controllers, secrets management
- Observability stack - Grafana, Prometheus, Mimir, Tempo, OpenTelemetry, or equivalent distributed tracing/metrics systems
- CI/CD - GitHub Actions (writing custom actions and reusable workflows), container registries, GitOps
- Git & Version Control: GitHub Enterprise, branching strategies, code review processes
- Networking: DNS, TCP/IP, firewalls, VPNs, ingress controllers
- Scripting & development - Shell scripting, Go, Python Dockerfile authoring
- Security mindset - Container scanning, secret management, network segmentation, WAF
- Strong communication skills in cross-functional and international setups
- Strong understanding of multithreading, concurrency, and performance tuning
- Strong leadership presence and ability to motivate diverse engineering teams.
- Strong understanding of Agile, Scrum
- Strategic thinker with the ability to make informed technical decisions.
- Proactive problem-solver with a focus on outcomes and quality.
- Background working in large / global enterprise environments and multicultural teams.
- Familiarity with tools like Jira, Confluence.
- Strong stakeholder management and communication skills.
- Interest in taking responsibility for exciting and innovative projects that further digitalize Deutsche Post / DHL.
- Excellent communication skills, both verbal and written in English.
What would be nice you bring:
- Experience with Splunk or Sentry
- Knowledge of Terraform Cloud at enterprise scale
- Experience operating Mattermost or similar self-hosted platforms
- Familiarity with Telepresence for Kubernetes dev workflows
- Experience with Azure capacity reservations and cost optimization
- Azure certifications (AZ-900, AZ-104, AZ-305)
Key Responsibilities
Leadership & Stakeholder Management
- Lead, mentor, and develop a team of platform engineers, including goal setting, performance reviews, and skill development.
- Foster a collaborative culture focused on innovation, ownership, quality, and continuous improvement, considering working in global teams
- Prioritize workload, allocate tasks, and ensure high-quality and timely delivery.
Infrastructure as Code & Cloud Architecture
- Design, implement, and maintain Azure infrastructure using Terraform and Terraform Cloud
- Develop and maintain reusable Terraform modules (AKS, Key Vault, Storage Accounts, PostgreSQL Flexible Server, Log Analytics, VMs, Capacity Reservations, Diagnostic Settings, Delete Locks)
- Manage multi-environment deployments with proper state management and drift detection
- Operate and evolve infrastructure for the Observability Platform (O11y) and shared platform services (SDB)
Kubernetes & Container Platform
- Operate and maintain Azure Kubernetes Service (AKS) clusters
- Manage Helm-based deployments for platform components
- Implement and enforce cluster policies with Kyverno
- Operate ingress solutions (Ingress-NGINX, Traefik)
- Manage secrets with External Secrets Operator (ESO) and Azure Key Vault
- Ensure cluster resilience through backup/recovery with Velero
- Build and maintain container images and Docker build pipelines
Observability & Monitoring
- Operate the full observability stack: Grafana Enterprise, Grafana Mimir (metrics), Grafana Tempo (tracing), Prometheus, Splunk, Sentry (APM)
- Implement and manage OpenTelemetry collectors and pipelines
- Maintain observability-as-code practices (dashboards, alerting rules, IRM configuration via Terraform)
- Operate custom monitoring solutions (prometheus-redis-exporter, blackbox-exporter, security monitoring)
CI/CD & Automation
- Build and maintain GitHub Actions workflows and reusable actions
- Operate self-hosted GitHub Action Runners (infrastructure and application runners)
- Manage automated dependency updates with Renovate
- Maintain Docker image build and caching pipelines
Security & Compliance
- Operate container vulnerability scanning with Trivy and Twistlock
- Monitor and manage Azure Service Principals and their lifecycle
- Implement network security (WAF with Coraza, SOCKS proxies, SFTP exchange)
- Manage secrets lifecycle and rotation strategies
Collaboration & Platform Services
- Operate Mattermost as a self-hosted collaboration platform
- Maintain database infrastructure (PostgreSQL, Redis) including DBA tooling and pgAdmin
- Support disaster recovery processes and tooling
An array of benefits for you:
- Hybrid work arrangements to balance in-office collaboration and home flexibility.
- Annual Leave: 42 days off apart from Public / National Holidays.
- Medical Insurance: Self + Spouse + 2 children. An option to opt for Voluntary Parental Insurance (Parents / Parent -in-laws) at a nominal premium covering pre existing disease.
- In House training programs: professional and technical training certifications.