Search by job, company or skills

Happiest Minds Technologies

PARTNER CONSULTANT - Reliability Analysis

Save
  • Posted 4 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Site Reliability Engineer
What will you do:
You'll be a key part of our Infrastructure Platform team, focusing on the critical infrastructure that powers . Beyond core infrastructure work, you'll also collaborate closely with a product development team, offering your expertise to coach and guide them on infrastructure and architectural decisions.

In your day-to-day, you will:

  • Build and maintain our production infrastructure to ensure scalability and high availability, while maximizing development team efficiency.
  •  Troubleshoot and debug issues related to both product and infrastructure.
  • Automate everything! If something's worth doing, it's definitely worth automating.
  • Improve and extend our Kubernetes platform, which leverages EKS.
  • Provide crucial insights into scalability for our developers.
  •  Participate in an on-call rotation to support our production systems.

    Who you are:

  • You're someone who loves ownership: you design it, you build it, you own it! You're a self-motivated individual and a strong team player within the Infrastructure Platform team.
  • You have at least 2 years of experience working as a DevOps Engineer (or a similar role like Software Engineer or Cloud Engineer).
  • You have proven experience in architecting systems based on both functional and non-functional requirements.

    Your qualifications
    You should be proficient in, or have solid knowledge of:
    Observability & Reliability
    SLO/SLI Management:

  • Experience defining and implementing Service Level Objectives (SLO) and Service Level Indicators (SLI) to measure service health.
  • Modern Observability: Proficiency with high-cardinality observability platforms; Honeycomb experience is a major plus, but experience with similar tools (e.g., New Relic, Datadog) is welcome.
  • Pro-active Monitoring: Proven ability to move beyond basic threshold alerts toward trend-based, pro-active alerting and distributed tracing.
  • Incident Response: Experience with blameless post-mortems and a focus on reducing toil through automation.

    Infrastructure & Orchestration
     

  • Containerization: Proficient in Container Orchestration and technologies such as Kubernetes and Docker.
  • Service Mesh: Experience with Istio for traffic management, security, and microservices observability.
  • Public Cloud: Strong hands-on experience with AWS.
  • Linux: Deep knowledge of Linux-based systems.
    Automation & Data
  • CI/CD: Experience with Jenkins or GitHub Actions.
  • Cloud Orchestration: Proficiency in Terraform and Ansible for automation and service configuration.
  • Data Engines: Familiarity with SQL, NoSQL, OpenSearch, and AWS S3.
  • Programming: Proficiency in at least one of our core languages: Python, TypeScript, or Java.

More Info

Job Type:
Industry:
Employment Type:

Job ID: 149089795