SRE Capability Practice: Standardize and monitor SRE practices to ensure effective implementation across teams.
Collaboration: Work closely with development squads to ensure alignment on application architecture (e.g., microservices, API gateways) and operational excellence.
Reliability Systems Architecture: Utilize expertise in cloud distributed computing to enhance system reliability and resilience.
Software Engineering: Design and implement solutions that improve system performance and reliability.
Continuous Improvement: Oversee processes related to DevOps practices, ensuring quality assurance throughout the software development lifecycle.
Education:
Bachelors degree/University degree or equivalent experience
Required Qualifications
Good Knowledge in programming languages like Scala or Python or Pyspark or Scala or SQL/PLSQL
Knowledge of batch/real time Spark processing pipelines.
Good Knowledge of Spark framework Core Spark, Spark Data Frames, Spark streaming
Person should have knowledge in Agile/DevOps Environment
Proficiency in cloud computing platforms (e.g., AWS) and understanding of reliability systems architecture.
Strong software engineering skills with experience in designing reliability-focused solutions.
Excellent communication skills for effective collaboration with various teams.
Skills and Competencies
Strong leadership abilities to motivate and engage team members.
Technical expertise in automation tools and CI/CD processes.
Ability to analyze metrics for performance tuning and fault finding.
Experience with incident management and disaster recovery planning