Search by job, company or skills

Movius

Senior Staff Site Reliability Engineer

new job description bg glownew job description bg glownew job description bg svg
  • Posted an hour ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

Job Title: Senior Staff Site Reliability Engineer

Location: Bangalore

About Movius

At Movius, we solve a critical gap companies face with employee-to-client communication over voice and messaging. We are the leading global provider of Secure Communication as a Service (SCaaS). Our flagship solution, MultiLine, enhances workflows, resolves compliance gaps and unifies cross-channel messaging. Movius AI-powered solutions enable businesses to build strong and lasting relationships with their customers in a company-owned, controllable system. Welcome to Phone 3.0.

Headquartered in Alpharetta, GA, with offices in Silicon Valley, Bangalore, India, New York, and London, Movius partners with leading global wireless carriers like T-Mobile, Vodafone, TELUS, BT, Singtel & more. To learn more about Movius, visit www.movius.ai .

Your Opportunity

We are looking for a Senior Staff Site Reliability Engineer (SRE) with strong technical expertise in distributed systems, cloud infrastructure, observability, and automation.

In this role, you will be responsible for improving the reliability, scalability, and performance of our production and pre-production systems. You will work hands-on in designing and implementing SRE frameworks, automating key reliability workflows, and building a culture of operational excellence.

You will also work closely with product engineering, QA, and DevOps teams to define SLOs/SLIs, enhance monitoring and alerting, and strengthen our overall reliability practices.

What You'll Do

  • Reliability Engineering & Architecture
  • Design and maintain highly available, fault-tolerant systems on AWS.
  • Implement service reliability models based on SLOs, SLIs, and error budgets.
  • Continuously improve system performance, scalability, and resilience.
  • Automation & Infrastructure-as-Code (IaC)
  • Build and maintain automation pipelines using Terraform, Ansible, Bitbucket, and Jenkins.
  • Develop reusable IaC modules for multi-account and multi-environment AWS setups.
  • Automate operational processes for provisioning, scaling, monitoring, and recovery.
  • Observability & Monitoring
  • Define observability standards and create dashboards using Elastic Stack, Grafana, or Prometheus.
  • Implement intelligent alerting using AIOps and anomaly detection tools.
  • Work with development teams to ensure proper telemetry and trace coverage.
  • Incident Management & RCA
  • Lead major incident response and ensure quick service restoration.
  • Conduct blameless post-incident reviews and implement preventive actions.
  • Create and maintain runbooks, escalation matrices, and reliability playbooks.
  • Performance & Capacity Planning
  • Analyse performance bottlenecks and propose tuning or optimization strategies.
  • Lead capacity forecasting and ensure the system can handle growth demands.
  • Collaboration & Mentorship
  • Partner with development, QA, and DevOps teams to embed SRE principles.
  • Coach and mentor junior engineers on reliability engineering and automation.
  • Documentation & Knowledge Management
  • Maintain detailed architecture diagrams, design documents, and operational procedures.
  • Document SLOs, automation workflows, and change management reports.
  • Technical Leadership
  • Lead technical discussions, reliability reviews, and performance retrospectives.
  • Promote a code-driven, automation-first reliability culture across teams.

What You Bring


Education

  • Bachelor's degree in Computer Science, Information Technology, or equivalent experience.

Experience


  • 8+ years in SRE or DevOps roles managing large-scale distributed systems.
  • Proven hands-on experience in cloud operations (AWS preferred), automation, and CI/CD pipelines.
  • Experience in the Telecom domain is an added advantage.

Technical Skills


  • Deep knowledge of AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API Gateway, Lambda, WAF, KMS).
  • Strong Linux administration and networking fundamentals.
  • Skilled in Terraform, Jenkins, Git, and scripting (Python, Bash).
  • Solid understanding of observability tools (Grafana, Elastic Stack, Prometheus).
  • Experience with container orchestration (Kubernetes) and microservices-based systems.

Certifications (Preferred)


  • AWS Certified DevOps Engineer / Solutions Architect Associate.
  • Terraform Associate or Kubernetes Certified Administrator (CKA).
  • SRE Foundation or Google SRE certification is desirable.

Why Join Movius


  • Work on a global-scale platform serving enterprise customers.
  • Be part of a high-performing, innovation-driven engineering team.
  • Competitive pay, benefits, and opportunities for professional growth.

Ready to build the future of reliable, secure, and intelligent communication


Apply now at www.movius.ai

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 136398863

Similar Jobs