
Search by job, company or skills
Job Description:
· Intangles Lab is looking for a hands-on Site Reliability Engineer from FinTech background to manage large 24×7 Cloud Operations.
· Looking for a Site Reliability Engineer with 2+ years of experience, having hands-on with the following technologies/skillset:
Must-Required Skills:
· AWS Cloud (Advanced): Certification is preferred.
· Networking (Intermediate): Proficiency in networking concepts is necessary.
· Ubuntu/Linux & OS (Advanced): Strong Linux & Networking basics, Prior working experience is preferred.
· Database (Basic Knowledge): Familiarity with SQL and NoSQL databases is required, having worked with at least one of them.
· Database Administration (MongoDB & PostgreSQL, Elasticsearch), having hands-on experience of at least one is required
· Containerization Tools: Docker
· Kubernetes (Advanced)
· Knowledge of Amazon EKS is compulsory.
· Working knowledge of StatefulSets is required.
· Familiarity with the HELM Chart is necessary.
CI/CD (Advanced):
· Proficiency in at least one CI/CD tool, such as CircleCI, Argo Project, GitHub Actions, or similar, is essential.
· Programming:
a.Basic programming knowledge is required, with the ability to write code.
b.Scripting Language: Python, Shell
Monitoring Stack:
· Prometheus, Grafana, Alert Mangaer, Istio, Jaeger, Datadog, PagerDuty (or similar). ElasticAPM
Optional Skills:
· Medium to High Level of Application Development Experience in languages like JavaScript, Python, and Java will be a bonus.
· Understanding of N-tier Architectures
· Understanding of REST & gRPC API Frameworks
· Understanding of Web Servers in NodeJS
Responsibilities:
· To work in a production environment with technologies like Linux, AWS, Terraform, Kubernetes, MongoDB, Elasticsearch & PostgreSQL Administration.
· To keep the production environment up & running, i.e. ensuring the reliability of the production environment.
· To troubleshoot, debug and fix issues in case of failures of the production and QA environment and provide technical solutions.
· To own the responsibilities of on-call as per the team's policy.
· To write and enhance automations as and when needed.
· To work closely with internal teams and customers to follow the processes and SLAs of uptime.
· To write, update and enhance documentation, including runbooks/playbooks and prepare postmortem reports for the production incidents.
· Considering the role is to ensure the platform's reliability, ready to work in a 24*7 work environment when required.
Additional Requirements:
· One should be aware of change/incident/problem/issue/risk management/escalations.
· Should be flexible in working in rotational shifts and night hours (Including weekends).
· Excellent thinking and problem-solving skills.
Job ID: 147136589