Site Reliability Engineer

Albert Invent

Hyderabad, India

2-4 Years

Save

Posted 20 days ago
Be among the first 20 applicants

Early Applicant

Job Description

The Software Engineer SRE will be responsible for building and maintaining highly reliable, scalable, and secure infrastructure that powers the Albert platform. This role focuses on automation, observability, and operational excellence to ensure seamless deployment, performance, and reliability of core platform services.

Key Responsibilities

Act as a passionate representative of the Albert product and brand.
Collaborate with Product Engineering and other stakeholders to plan and deliver core platform capabilities that enable scalability, reliability, and developer productivity.
Work with the Site Reliability Engineering (SRE) team on shared full-stack ownership of a collection of services and/or technology areas.
Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of all microservices.
Design and deliver the mission-critical stack, focusing on security, resiliency, scale, and performance.
Take ownership of end-to-end performance and operability.
Apply strong knowledge of automation and orchestration principles.
Serve as the ultimate escalation point for complex or critical issues not yet documented as Standard Operating Procedures (SOPs).
Troubleshoot and define mitigations using a deep understanding of service topology and dependencies.

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent experience.

2+ years of software engineering experience, with at least 1 year in an SRE role focused on automation.

Strong experience in Infrastructure as Code (IAC), preferably using Terraform.

Proficiency in Python or Node.js, with experience designing RESTful APIs and working in microservices architecture.

Solid expertise in AWS cloud infrastructure and platform technologies including APIs, distributed systems, and microservices.

Hands-on experience with observability stacks, including centralized log management, metrics, and tracing.

Familiarity with CI/CD tools (e.g., CircleCI) and performance testing tools like K6.

Passion for bringing automation and standardization to engineering operations.

Ability to build high-performance APIs with low latency (

Ability to work in a fast-paced environment, learning from peers and leaders.

Demonstrated ability to mentor other engineers and contribute to team growth, including participation in recruiting activities.

Good to Have

Experience with Kubernetes and container orchestration.
Familiarity with observability tools such as Prometheus, Grafana, OpenTelemetry, or Datadog.
Experience building Internal Developer Platforms (IDPs) or reusable frameworks for engineering teams.
Exposure to ML infrastructure or data engineering workflows.
Experience working in compliance-heavy environments (e.g., SOC2, HIPAA).

Skills:- Automation, Terraform, Python, NodeJS (Node.js) and Amazon Web Services (AWS)