Search by job, company or skills

Artech Infosystems Private Limited

Bigdata Platform Engineer

8-9 Years
25 - 28 LPA
new job description bg glownew job description bg glownew job description bg svg
  • Posted 8 hours ago
  • Be among the first 20 applicants
Early Applicant
Quick Apply

Job Description

Site Reliability Engineer

The Site Reliability Engineering team plays a critical role in ensuring the stability, performance, and reliability of Client

s internal platforms. We are responsible for troubleshooting and maintaining applications built on a complex, distributed, and cloud-native infrastructure, heavily leveraging technologies like Apache Spark and Apache Airflow. Our mission is to support cross-functional teams across Client by ensuring their Spark jobs run smoothly and efficiently, providing essential operational support and expertise. We are a team of curious problem-solvers, always striving to understand the why behind the what, and digging deep into the internals of systems to truly understand how they work.

Role Summary

As a Site Reliability Engineer, you will be at the forefront of operational excellence, directly impacting the productivity of numerous teams within Client. You will be responsible for diagnosing and resolving complex issues related to Spark applications and workflows running on our internal platform. This role requires a strong problem-solver with a deep, intrinsic curiosity to understand how applications function at a detailed level, and the ability to troubleshoot effectively when things aren't working right. You are someone who isn't satisfied with just knowing how to do something but rather seeks to dig in and truly understand the underlying mechanisms and internals of the system.

  • Key ResponsibilitiesTroubleshoot and resolve complex application issues, providing detailed root cause analysis and preventative measures.
  • Provide direct support to individual users and cross-functional teams, diagnosing and resolving their Spark job-related problems with a focus on understanding the core issue.
  • Maintain the reliability and performance of critical infrastructure that power Spark and Airflow.
  • Automate operational tasks and improve system efficiency through scripting, always looking for opportunities to enhance stability and reduce manual intervention.
  • Collaborate with development and other SRE teams to identify root causes of issues and implement robust, long-term solutions.
  • Participate in on-call rotations to ensure continuous availability and rapid response to incidents.

  • Required QualificationsStrong understanding and hands-on experience troubleshooting applications deployed on Kubernetes.
  • Basic proficiency in Python for scripting and automation tasks.
  • Deep and practical knowledge of networking principles, with the ability to diagnose network issues using standard command-line tools.
  • Experience with containerization technologies (e.g., Docker) and their orchestration.
  • Proficient in Linux operating systems, including:
  • Advanced command-line tools for system diagnostics and troubleshooting (e.g., for inspecting network routes, open files, process information).
  • Scripting and system administration.
  • A strong desire to understand the internals of the OS.

  • Preferred QualificationsFamiliarity with Apache Spark and Apache Airflow, given their central role in day-to-day troubleshooting.
  • Basic understanding of Java, which may be occasionally required for specific jobs, though deeper Java expertise is often handled by cross-functional teams.

Tech Stack

Python scripting, Linux,Networking,Kubernetes, Docker, Previous SRE experience

Shift timing CST timing which is 7:30pm to 4:30am.

More Info

Function:
Employment Type:
Open to candidates from:
Indian

About Company

Artech is the largest Women & Minority owned IT staffing firm in the US, with US$ 800 million annual revenue run rate in 2021 and a footprint across the globe. With nearly three decades of experience, Artech empowers businesses through applied human intelligence and offers a spectrum of services that include Workforce Solutions (Contingent Staffing, Bulk/ Project Staffing, Master Vendor, RPO, Direct Hire and Payroll Transition) and Project-Based Solutions (Digital Experience, Technical Operations, Technical Development, Business Operations & Digital Platforms). Artech works with over 90 Fortune 500 clients across USA, Canada, India, and China.
At Artech, we are empowering talent by connecting potential with opportunities through applied human intelligence. We empower our teams to maximize the impact of their intellect, through a performance oriented, diverse, flexible, and inclusive work environment supported by our continuous learning and development focus.

Job ID: 145017271