
Search by job, company or skills

Requirements:
We are seeking a proactive and technically strong Site Reliability Engineer (SRE) to ensure
the stability, performance, and scalability of our Data Engineering Platform. You will work on
cutting-edge technologies including Cloudera Hadoop, Spark, Airflow, NiFi, and
Kubernetes ensuring high availability and driving automation to support massive-scale data
workloads, especially in the telecom domain.
Key Responsibilities
• Ensure platform uptime and application health as per SLOs/KPIs
• Monitor infrastructure and applications using ELK, Prometheus, Zabbix, etc.
• Debug and resolve complex production issues, performing root cause analysis
• Automate routine tasks and implement self-healing systems
• Design and maintain dashboards, alerts, and operational playbooks
• Participate in incident management, problem resolution, and RCA documentation
• Own and update SOPs for repeatable processes
• Collaborate with L3 and Product teams for deeper issue resolution
• Support and guide L1 operations team
• Conduct periodic system maintenance and performance tuning
• Respond to user data requests and ensure timely resolution
• Address and mitigate security vulnerabilities and compliance issues Technical Skillset
• Hands-on with Spark, Hive, Cloudera Hadoop, Kafka, Ranger
• Strong Linux fundamentals and scripting (Python, Shell)
• Experience with Apache NiFi, Airflow, Yarn, and Zookeeper
• Proficient in monitoring and observability tools: ELK Stack, Prometheus, Loki
• Working knowledge of Kubernetes, Docker, Jenkins CI/CD pipelines
• Strong SQL skills (Oracle/Exadata preferred)
Job ID: 114645379
Skills:
Hive, Hadoop, Pyspark, Shell Scripting, Python, Airflow
Skills:
Databricks, Microservices, Tensorflow, Kafka, Opencv, Machine Learning, AWS, Pytorch, Kubernetes, Python, Azure, Gcp, Docker, Apis, Git, Spark, data pipelines, ONNX, AI-assisted engineering tools, Airflow, MLflow, DevOps MLOps practices, CI CD pipelines
Skills:
Jfrog Artifactory, AWS Glue, Bash, Sql, Apache Airflow, Jenkins, Lambda, Azure Data Factory, Gcp, Docker, Terraform, Databricks, Azure, Python, AWS, Step Functions, GitHub Actions
We don’t charge any money for job offers