Job Description for Chief Manager - Site Reliability Engineer at Bajaj Broking:
Responsibilities
As the Chief Manager - Site Reliability Engineer at Bajaj Broking, you will play a crucial role in ensuring the availability, performance, and reliability of our services. You will lead a team of engineers focused on building and maintaining the infrastructure that supports our applications and systems. Your responsibilities will include:
- Designing, implementing, and managing scalable and reliable systems that support our trading platform and associated services.
- Collaborating with development teams to ensure that application designs support operational requirements, including scalability and performance monitoring.
- Developing and implementing automation tools and frameworks for deploying and managing applications in production environments.
- Establishing and maintaining service level objectives (SLOs) and service level agreements (SLAs) to ensure that performance metrics are aligned with business goals.
- Monitoring systems to proactively identify performance bottlenecks and resolve issues before they impact users.
- Leading incident response efforts, conducting root cause analyses, and implementing preventative measures to enhance overall system reliability.
- Researching and implementing new technologies that enhance our reliability and operational efficiency.
- Mentoring and guiding team members to foster a culture of continuous improvement and learning within the Site Reliability Engineering team.
Skills Required
- Strong experience in system administration, networking, and cloud infrastructure (AWS, Azure, or GCP).
- Proficiency in programming and scripting languages such as Python, Go, Bash, or similar languages.
- Deep understanding of container orchestration technologies (Docker, Kubernetes).
- Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or similar.
- Familiarity with configuration management tools (Ansible, Puppet, Chef).
- Knowledge of database management (SQL and NoSQL databases).
- Strong problem-solving skills and the ability to troubleshoot complex systems and applications.
- Excellent communication and leadership skills to foster collaboration among cross-functional teams.
Tools Required
- Cloud Platforms: AWS, Azure, Google Cloud Platform
- Containerization: Docker, Kubernetes
- Monitoring and Logging: Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana)
- Configuration Management: Ansible, Puppet, Chef
- Programming/Scripting: Python, Go, Bash
- Databases: MySQL, PostgreSQL, MongoDB, Redis
This position offers an exciting opportunity to be at the forefront of technology within the financial services sector, contributing to the reliability and performance of our key trading systems.