You'll be a good fit if you have:
- Mastery of AI-native development and automation as a default working mode.
- Expert-level proficiency in Python and Go.
- Deep technical expertise in at least four of the following domains:
- Cloud Architecture: Multi-region, compliance-heavy enterprise setups.
- Kubernetes: Platform engineering at massive scale.
- Observability: Unified metrics, agentic workflow tracing, and LLM monitoring.
- LLM Ops: Model serving, routing, and cost optimization for AI workloads.
- Compliance Automation: Infrastructure security, secrets management, and zero-trust networking.
- Exceptional architectural judgment with the ability to make high-stakes technical trade-offs that align with long-term business goals.
- Strong communication skills capable of translating complex infrastructure concepts for both technical teams and business stakeholders.
- A natural leader with a passion for mentoring and elevating senior and lead engineers.
Key Responsibilities:
- Define the long-term global roadmap for multi-region, multi-tenant cloud infrastructure that meets Fortress level standards for reliability, compliance, and enterprise isolation.
- Architect and own a unified CI/CD optimizationtrategy across application code, infrastructure-as-code, ML models, and RAG pipelines, with a strong focus on Kubernetes architecture including GPU federation and cost optimization.