Senior SRE Engineer

IDEMIA

Noida, India

5-8 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Description

We are looking for a highly skilled and proactive Site Reliability Engineer (SRE) to support our application at the customer site. The ideal candidate will have strong hands-on experience with Linux, MariaDB/Cassandra databases, Docker-Compose deployments, and containerized application operations.

In this role, you will be responsible for managing day-to-day operational activities, ensuring platform reliability, improving observability, and coordinating closely with customer teams, development teams, and internal stakeholders.

Key Responsibilities

Monitor end-to-end performance and health of the application and its supporting services.

Manage and operate containerized deployments using Docker and Docker-Compose.

Handle day-to-day tasks at the customer site, including incident response, troubleshooting, and ticket management.

Perform detailed Root Cause Analysis (RCA) for incidents and provide clear, actionable reports.

Improve observability by developing dashboards, metrics, and alerting mechanisms.

Work closely with development teams to provide reliability inputs for new features, deployments, and change planning.

Support database operations for MariaDB and Cassandra, including basic troubleshooting and performance monitoring.

Coordinate planned changes, deployments, and production activities with customer teams, ensuring clear communication.

Manage and automate SSL / mutual SSL (mSSL) certificate renewals, expirations, and secure configuration across environments.

Mandatory Skills

Between 5 to 8 Years of experience required.

Strong hands-on experience with Docker and Docker-Compose in production environments.

Solid experience working on Linux-based systems (RHEL/Ubuntu) and Mysql database.

Working knowledge of MariaDB and Cassandra databases.

Adept in troubleshooting and issue solving skills

Experience operating and supporting highly available, distributed applications.

Proficiency in building and managing dashboards and alerts using tools like Grafana, Prometheus, etc.

Strong understanding of incident management, troubleshooting, and writing clear RCA documentation.

Good understanding of SSL/mSSL configuration and secure service-to-service communication.

Excellent communication, coordination, and customer-facing skills.