
Search by job, company or skills

Location - Remote
Timezone - Mandatory 4 hrs overlap with EST timezone
Mandatory Skills -
This is an infrastructure-centric role. The successful candidate will collaborate directly with SRE and Cloud Centers of Excellence (CoE) to manage, upgrade, and maintain the infrastructure that powers backend applications. The primary objective is ensuring seamless application deployment and infrastructure stability, rather than feature development.
Primary Responsibilities-Manage and upgrade enterprise-grade infrastructure, focusing on high-availability and scalability.
-Lead application deployments across Kubernetes clusters using GitOps principles.
-Coordinate with SRE teams to maintain system uptime and implement infrastructure-level patches and upgrades.
-Provide expert-level troubleshooting for backend applications (J2EE/Spring/Hibernate) from an operational and performance perspective.
-Configure and optimize CDN layers to ensure global delivery performance.
Technical Requirements-Infrastructure Management: Expert knowledge of AWS EKS and Kubernetes administration.
-Automation/CD: Mastery of GitOps tools, specifically ArgoCD or Flux, for automated deployments.
-Operational Troubleshooting: Deep understanding of J2EE, Spring, and Hibernate to diagnose performance issues and system crashes.
-CDN Governance: Technical proficiency in managing and configuring Cloudflare or CloudFront.
-Observability Mastery: Advanced use of Splunk, Dynatrace, or Datadog for proactive system monitoring and alerting.
Preferred Qualifications-Experience working within a formal Cloud Center of Excellence (CoE) or SRE team.
-Demonstrated experience in performing major version upgrades of Kubernetes clusters or critical middleware.
Professional Attributes-Operational Mindset: Prioritizes stability, security, and scalability over feature delivery.
-Collaboration: Ability to work effectively with global Ops teams and handle technical handovers.
-Reliability: Proven track record of managing production-critical infrastructure without downtime.
Job ID: 148881697
Skills:
Java, Appdynamics, Cloudwatch, Prometheus, Dynatrace, Bash, Aws Cloud, Splunk, Grafana, Python
Skills:
Monitoring Tools, cloud, Linux, Distributed Systems, metrics, Kubernetes, Python, error budgets, logs, traces, SLOs, incident governance, observability
Skills:
Yaml, Bash, Json, Gcp, ECS, Azure, Kubernetes, Python, AWS, PingAM, GCE, server-less architectures, Fargate, ForgeRock, PingGateway, PingDS, PingIDM
Skills:
Yaml, Continuous Delivery, Bash, Json, Gcp, ECS, Kubernetes, Python, AWS, PingAM, Disaster Recovery, Fargate, ForgeRock, Configuration as Code, PingGateway, PingIDM, PingDS
Skills:
Golang, Terraform, Linux, Ansible, Helm, Kubernetes, Python, AWS, ArgoCD
We don’t charge any money for job offers