Senior SRE Engineer

Albert Invent

Bengaluru, India

4-6 Years

This job is no longer accepting applications

Posted 2 months ago

Job Description

Drive the design, automation, and reliability of Albert Invent's core platform to support scalable, high-performance AI applications.

You will partner closely with Product Engineering and SRE teams to ensure security, resiliency, and developer productivity while owning end-to-end service operability.

Key Responsibilities

Own the design, reliability, and operability of Albert's mission-critical platform.
Work closely with Product Engineering and SRE to build scalable, secure, and high-performance services.
Plan and deliver core platform capabilities that improve developer velocity, system resilience, and scalability.
Maintain a deep understanding of microservices topology, dependencies, and behavior.
Act as the technical authority for performance, reliability, and availability across services.
Drive automation and orchestration across infrastructure and operations.
Serve as the final escalation point for complex or undocumented production issues.
Lead root-cause analysis, mitigation strategies, and long-term system improvements.
Mentor engineers in building robust, automated, and production-grade systems.
Champion best practices in SRE, reliability, and platform engineering.

Must-Have Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

4+ years of strong backend coding in Python or Node.js.

4+ years of overall software engineering experience, including 2+ years in an SRE / automation-focused role.

Strong hands-on experience with Infrastructure as Code (Terraform preferred).

Deep experience with AWS cloud infrastructure and distributed systems (microservices, APIs, service-to-service communication).

Experience with observability systems – logs, metrics, and tracing.

Experience using CI/CD pipelines (e.g., CircleCI).

Performance testing experience using K6 or similar tools.

Strong focus on automation, standards, and operational excellence.

Experience building low-latency APIs (

Ability to work in fast-paced, high-ownership environments.

Proven ability to lead technically, mentor engineers, and influence engineering quality.

Good-to-Have Skills

Kubernetes and container orchestration experience.
Observability tools such as Prometheus, Grafana, OpenTelemetry, Datadog.
Experience building Internal Developer Platforms (IDPs) or reusable engineering frameworks.
Exposure to ML infrastructure or data engineering pipelines.
Experience working in compliance-driven environments (SOC2, HIPAA, etc.).

Skills:- Automation, Terraform, Python, NodeJS (Node.js) and Amazon Web Services (AWS)