In one sentence
As the SRE Lead, you will be responsible for the reliability, operational excellence, and release governance of amAIz (Telco Agentic Suite). You will lead a cross-functional team of NFT, QA, and DevOps Engineers, driving best practices in observability, automation, performance optimization, quality assurance, and orchestrating smooth, predictable releases across environments.
All you need is...
- Bachelor's degree in Science/IT/Computing or equivalent.
- 5+ years of experience in SRE, DevOps, or infrastructure engineering roles.
- Proven leadership experience managing cross-functional engineering teams.
- Excellent communication and stakeholder management skills.
- Strong understanding of cloud platforms (AWS, GCP, or Azure).
- Experience with container orchestration (Kubernetes), CI/CD, and Infrastructure as Code.
- Knowledge in ArgoCD an advantage.
- SaaS experience an advantage.
- Proficiency in monitoring tools (Prometheus, Grafana, Datadog, etc.).
- Solid scripting/coding skills (Python, Go, Bash).
- Experience with QA methodologies, test automation, and E2E testing frameworks.
- Experience in Release Management: planning, scheduling, and coordinating releases in complex environments.
- GenAI experience an advantage.
What will your job look like
- Lead and mentor the DevOps team to build scalable, secure, and automated infrastructure for amAIz (Telco Agentic Suite).
- Automate CI/CD pipelines to streamline deployments and ensure fast, reliable delivery of features and fixes.
- Establish and maintain observability systems (monitoring, alerting, logging) to enable proactive issue detection and resolution.
- Promote and integrate GenAI capabilities into the SDLC, ensuring all R&D teams leverage these tools effectively.
- Drive FinOps practices to optimize cloud costs and resource utilization.
- Guide the QA team in designing and executing E2E test cases for Generative AI workflows and platform features.
- Integrate test automation into CI/CD pipelines to support continuous quality validation.
- Define and track quality metrics to ensure release readiness and platform stability.
- Lead NFT efforts for performance, scalability, and reliability testing to validate system behavior under load.
- Define and maintain non-functional requirements in collaboration with product and architecture teams.
- Analyze NFT results and drive optimizations to meet enterprise-grade standards and SLAs.
- Coordinate release planning, scope definition, and risk assessment with stakeholders.
- Govern release processes, including approvals, documentation, and compliance with change management policies.
Why you will love this job:
- The chance to serve as a specialist in software and technology.
- You will take an active role in technical mentoring within the team.
- We provide stellar benefits from health to dental to paid time off and parental leave!