Search by job, company or skills

  • Posted 4 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Key Responsibilities

  • Ensure high availability, performance, and reliability of production systems
  • Define and manage SLIs, SLOs, and error budgets
  • Lead incident response, root cause analysis (RCA), and post-incident reviews
  • Proactively identify and mitigate reliability risks

Software Eng & Automation

  • Develop software solutions to automate operational tasks
  • Build and maintain tools, frameworks, and platforms for deployment, monitoring, and reliability
  • Reduce toil through automation and self-service systems
  • Write clean, maintainable, and testable code

Infrastructure & Cloud

  • Design and manage cloud-native infrastructure (AWS, Azure, or GCP)
  • Implement Infrastructure as Code (IaC) using tools like Terraform or CloudFormation
  • Support containerized workloads using Docker and Kubernetes
  • Optimize systems for scalability and cost efficiency

Observability & Performance

  • Implement monitoring, logging, and tracing solutions
  • Analyze system metrics to improve performance and capacity planning
  • Establish dashboards and alerts aligned with business impact

CI/CD & DevOps Practices

  • Build and maintain CI/CD pipelines
  • Collaborate with development teams to improve release safety
  • Promote best practices in testing, deployment, and rollback strategies

Collaboration & Culture

  • Partner with product and engineering teams to design reliable solution adhering best practices of architectures
  • Advocate for SRE best practices and reliability-first mindset
  • Contribute to documentation and knowledge sharing

Required Qualifications

  • Bachelor's degree in Computer Science or equivalent
  • Strong programming skills in Python, Java, Go, or similar languages
  • Experience with distributed systems and microservices
  • Hands-on experience with Linux/Unix systems
  • Familiarity with cloud platforms (AWS, Azure, or GCP)
  • Exp with containers and orchestration (Docker, Kubernetes)
  • Knowledge of CI/CD tools (GitHub Actions, Jenkins, GitLab CI, etc.)

Preferred Qualifications

  • Experience in an SRE or DevOps role
  • Knowledge of service meshes, load balancing, and traffic management
  • Experience with chaos engineering or resilience testing
  • Background in security best practices (IAM, secrets management)
  • Experience supporting regulated or mission-critical systems

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 145109845

Similar Jobs