About Loop
Loop AI is a San Francisco-based tech company founded in 2022. We provide a Delivery Intelligence Platform for data-driven digital food brands, helping them optimize operations and make informed decisions. Loop is at the forefront of innovation in the food-tech sector. At Loop, we value a growth mindset, passion for startups, product curiosity, and a hunger for continuous learning. We believe that the collaborative spirit among our peers is the lifeblood of our startup.
Role Overview
We are seeking a highly motivated and experienced Senior DevOps Engineer to join our engineering team at LoopAI. This is a critical role to define, architect, build, and scale our entire cloud infrastructure that powers our Delivery Intelligence Platform. This is a 01 opportunity to influence our technical roadmap, automate everything from deployment to monitoring, and ensure our systems can scale alongside our rapidly growing customer base. The ideal candidate is a proactive problem-solver who thrives in a fast-paced environment and is passionate about developer experience and operational excellence.
Key Responsibilities
- Infrastructure as Code (IaC): Design, provision, and manage cloud infrastructure using tools like Terraform or Pulumi to ensure consistency across environments
- Container Orchestration: Manage and optimize our Kubernetes clusters (GKE), including scaling, security hardening, and resource optimization
- CI/CD Automation: Build and maintain robust GitHub Actions pipelines to enable rapid, reliable, and secure software delivery
- Observability: Implement and evolve our monitoring, logging, and alerting stack (e.g., Prometheus, Grafana, DataDog) to provide deep insights into system performance
- Security & Compliance: Integrate security best practices (DevSecOps) into the SDLC, including secret management, IAM, and vulnerability scanning
- Collaboration: Partner with SDEs and Product teams to optimize application performance, troubleshoot complex production issues, and improve developer workflows
- Self-Service Infrastructure: Enable engineering teams to provision resources independently through standardized, pre-approved Terraform modules and internal APIs
- Local Development: Standardize and optimize local development environments (e.g., using Docker Compose, Dev Containers) to ensure parity with production and minimize works on my machine issues
- Internal Tooling: Build and maintain CLI tools or internal dashboards that simplify routine tasks like secret management, logs access, and environment switching
- Innovation: Leverage emerging technologies, including GenAI tooling, to automate routine infrastructure tasks and improve operational efficiency
- GCP Infrastructure Best Practices:Identity & Access: Enforce the Principle of Least Privilege (PoLP) using granular IAM roles rather than primitive roles
- Network Security: Secure our perimeter using VPC Service Controls, Cloud Armor, and strict firewall rules
- Cost Optimization: Drive FinOps initiatives by leveraging Committed Use Discounts and identifying zombie instances or unoptimized storage
- Data Integrity: Ensure robust backup and disaster recovery plans for Cloud SQL and Cloud Storage, including object versioning
Essential Skills
- Education: Bachelor's or Master's degree in Computer Science, Engineering, or a related technical field
- Experience: At least 4 to 7 years of experience in a DevOps, SRE, or Infrastructure role, preferably in a high-growth startup environment
- Cloud Proficiency: Hands-on experience with Google Cloud Platform (GCP) and its core services (Compute Engine, GKE, BigQuery, Cloud Run, Cloud SQL).
- Automation: Strong proficiency in scripting and automation using Python or Go
- Orchestration: Deep expertise in Kubernetes and Docker containerization
- Infrastructure: Solid understanding of Linux internals, networking (TCP/IP, DNS, VPNs), and distributed systems
Desired Skills (Plus Points)
- Professional GCP certifications (e.g., Professional Cloud DevOps Engineer or Professional Cloud Architect)
- Familiarity with database management for PostgreSQL, Redis, or NoSQL databases
- Knowledge of FinOps practices to optimize cloud costs
Key Competencies
- Exceptional problem-solving and analytical skills
- Excellent written and verbal communication skills for cross-functional collaboration
- Ability to work independently and take full ownership of infrastructure components
- A security-first mindset and a commitment to high system availability
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.