The Site Reliability Engineer III (SRE III) plays a critical role in ensuring Emburse's systems are highly available, scalable, and performant. This role blends deep technical expertise with strong collaboration and leadership skills to drive operational excellence across distributed systems. The ideal candidate is passionate about automation, cloud infrastructure, observability, and continuous improvement, while mentoring junior engineers and driving reliability culture across the organization.
Education:
- Required: Bachelor's degree in Computer Science or a STEM field Experience:
- Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
Required Skills:
- Minimum 6 years of experience in an engineering or operations role with a focus on reliability, scalability, and automation.
- Strong proficiency in Linux-based distributed environments (up to 70% hands-on work).
- Deep experience with cloud platforms (AWS or Azure) and Infrastructure-as-Code (Terraform).
- Excellent scripting skills (Python, Bash, Powershell); object-oriented programming experience is a plus.
- Demonstrated ability to develop and maintain internal tools and automation solutions.
- Excellent written and verbal communication skills in English.
- Strong project management and organizational abilities with a bias for action.
- Experience collaborating with offshore or globally distributed teams.
- Expertise in containerization and orchestration technologies (Docker, Kubernetes).
- Strong understanding of DevOps principles and modern CI/CD pipelines.
- Experience with observability stacks (Prometheus, Grafana, OpenTelemetry).
- Familiarity with self-healing systems, and site reliability best practices.
Background in SaaS environments or large-scale distributed applications.