Search by job, company or skills

cloudethix

Spark on Kubernetes Support Engineer

3-10 Years
Save
new job description bg glownew job description bg glow
  • Posted 6 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Role Overview
We are looking for a skilled Spark on Kubernetes Support Engineer to provide L1/L2 support for large-scale data platforms. This role involves monitoring, troubleshooting, and optimizing Spark workloads running on Kubernetes, ensuring high availability and performance of data pipelines.

Key Responsibilities
  • Act as first-level escalation for 24×7 monitoring of Spark (batch & streaming) workloads on Kubernetes
  • Troubleshoot Spark job failures, performance issues, and resource bottlenecks
  • Diagnose Kubernetes issues (pod failures, OOMKilled, evictions, DiskPressure, scaling issues)
  • Monitor Spark UI, cluster health, and resource utilization
  • Collaborate with development teams to debug and optimize pipelines
  • Handle Sev1/Sev2 incidents, including RCA and war-room coordination
  • Build and maintain monitoring dashboards and alerting frameworks (Prometheus/Grafana/ELK)
  • Support CI/CD pipelines and deployment automation using Azure DevOps
  • Maintain SOPs, runbooks, and drive continuous improvements
Required Skills
  • 3–10 years in Big Data / Distributed Systems / Cloud Support
  • Strong expertise in Apache Spark (Core, SQL, Structured Streaming)
  • Hands-on experience with Spark on Kubernetes
  • Good understanding of Kubernetes architecture & troubleshooting
  • Experience with Azure DevOps (CI/CD pipelines, Git, deployments)
  • Strong knowledge of Linux, SQL, and scripting (Python/Shell)
  • Familiarity with monitoring tools: Prometheus, Grafana, ELK
Good to Have
  • Experience with Kafka / streaming ecosystems
  • Exposure to cloud platforms (Azure/AWS/GCP)

More Info

Job Type:
Industry:
Function:
Employment Type:

About Company

Job ID: 147167437