Experience
- Min 5 years of experience in DevOps, SRE, or Cloud Operations roles.
- Hands-on experience operating customer-facing production systems.
- Experience supporting cloud-hosted services (OCI, AWS, or Azure).
Technical Skills
- Strong Linux fundamentals (processes, memory, networking basics).
- Scripting experience in Python and/or Bash.
- Understanding of CI/CD concepts and tools (Jenkins, GitHub Actions, GitLab CI, etc.).
- Working knowledge of containers (Docker) and basic Kubernetes concepts.
- Familiarity with monitoring and logging tools.
- Basic understanding of TCP/IP, DNS, load balancing, and HTTP/S.
Cloud & Infrastructure
- Experience using cloud APIs and consoles.
- Exposure to infrastructure automation and configuration management.
- Understanding of security basics (IAM, certificates, secrets management).
Nice to Have (Plus Skills)
- Exposure to Contact Center or CCaaS platforms (Genesys, Zoom, Avaya, etc.).
- Understanding of SIP, RTP, SBCs, or Voice infrastructure.
- Experience with OCI services (Compute, Networking, Load Balancers).
- Basic knowledge of databases and storage systems.
What Success Looks Like in This Role
- Reduced manual operational effort through automation.
- Faster incident detection and resolution.
- Well-maintained documentation and repeatable processes.
- Reliable execution of day-to-day operational tasks.
- Strong collaboration with senior engineers and platform teams.
IC3 Scope Clarity (Important)
- This role is hands-on execution focused, not primary architecture ownership.
- Works under guidance for complex designs and major incidents.
- Expected to grow toward higher ownership over time.
DevOps & Automation
- Build and maintain automation scripts to reduce manual operational work.
- Contribute to Infrastructure as Code (IaC) using tools such as Terraform, ARM, or OCI Resource Manager.
- Support CI/CD pipelines for application and platform components.
- Assist in improving zero-touch deployments and configuration consistency.
Operations & Reliability
- Monitor production systems for availability, latency, performance, and capacity.
- Participate in incident response, troubleshooting, and root cause analysis.
- Execute runbooks and SOPs for common operational scenarios.
- Assist with disaster recovery drills, failover testing, and validation activities.
- Perform routine platform maintenance, patching, and upgrades.
Observability & Monitoring
- Implement and maintain monitoring, alerting, and dashboards.
- Analyze trends in system metrics and logs to proactively identify risks.
Collaboration & Documentation
- Work with application, security, and network teams to resolve issues.
- Create and update technical documentation, SOPs, and operational guides.
- Communicate clearly during incidents and operational reviews.
Career Level - IC3