
Search by job, company or skills
Role: Engineering Lead
Location: India - Remote
Exp: 6+ Years
Immediate Joiners
Early Stage AI Product Startup
We are hiring an experienced, hands-on Engineering Lead to enforce engineering best practices and improve reliability across our India engineering team. You will lead and mentor an initial team of 34 engineers, enforce the SDLC and code quality standards, conduct code reviews and merge readiness checks, coordinate incident response and on-call practices, define SLIs/SLOs, create runbooks/playbooks, and drive resilience and operational automation. You'll be both a technical contributor and a force-multiplier for engineering quality and reliability while preparing the org for scale under a future Director.
What You'll Do
- Lead and mentor a small engineering team (34 engineers initially); run regular 1:1s, provide performance feedback, and support career growth.
- Enforce the SDLC and development best practices across the team: branching strategy, PR workflow, testing standards, CI/CD rules, release and rollback procedures.
- Conduct and enforce thorough code reviews and PR quality gates; approve merges only when readiness criteria are met.
- Contribute hands-on: implement features, pair-program, and resolve critical production issues when required.
- Define, track, and maintain SLIs and SLOs for critical services; work with product and business stakeholders to align targets and error budgets.
- Coordinate incident response: lead on-call rotations, triage incidents, run blameless postmortems, and drive remediation with clear action plans.
- Create and maintain runbooks, playbooks, and automated recovery procedures for common failure modes; ensure runbooks are tested and up to date.
- Improve system resiliency through capacity planning, load testing, fault-tolerance design, and chaos/failure-mode experiments.
- Implement and tune observability (metrics, tracing, logs) and alerting to reduce noise and improve signal-to-noise ratio.
- Automate operational toil (deployments, rollbacks, backups, DB maintenance) using CI/CD and Infrastructure-as-Code.
- Monitor cloud and vector DB costs; recommend and implement cost-optimization strategies without sacrificing reliability.
- Manage sprint planning, capacity allocation, and delivery commitments; ensure predictable delivery cadence and transparent reporting.
- Document architecture, ownership, runbooks, onboarding materials, and prepare knowledge transfer artifacts for the future Director.
- Participate in hiring and onboarding; contribute to interview loops and candidate evaluation.
Security & Compliance
- Understand and adhere to the company's information security policies.
- Immediately report any suspected security incidents, phishing attempts, or data breaches.
- Participate in required security awareness training and compliance activities.
What We're Looking For in You
Technical Skills
- 6+ years of software engineering experience with 2+ years in a lead/mentor role (or equivalent proven leadership on projects).
- Strong coding ability (Python preferred; TypeScript/Node.js experience a plus) and proven experience performing high-quality code reviews.
- Practical experience with CI/CD pipelines, Git workflows, branch protection, and release processes.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP), containerization (Docker) and orchestration (Kubernetes/GKE/EKS, Cloud Run, or ECS).
- Strong troubleshooting skills across application, infrastructure, and networking layers.
- Experience with observability and incident management tools (Prometheus, Grafana, Sentry, ELK/Loki, Jaeger/Tempo, PagerDuty/OpsGenie, etc.).
- Experience with IaC (Terraform) and secrets management (Vault, AWS Secrets Manager).
SRE / Reliability Experience (required or strong preference)
- Defined and tracked SLIs/SLOs and used error budgets to drive prioritization.
- Led incident response and postmortem processes with actionable remediation.
- Built and maintained runbooks and automated recovery procedures.
- Performed capacity planning, load testing, and reliability improvements.
- Automated operational tasks to reduce toil and improve MTTR.
Soft Skills
- Strong communicator and teacher can explain technical concepts to both engineers and non-technical stakeholders.
- Able to enforce standards and processes with influence rather than by directive.
- Pragmatic problem solver who balances quality, speed, and business needs.
- Comfortable working in a fast-changing startup environment and adapting priorities.
Preferred / Nice-to-Have
- Experience with AI/ML systems, vector databases, and LLM integrations (LangChain, Weaviate, Pinecone, N8N, MCP, Bedrock, OpenAI, Anthropic, Gemini, Llama, etc).
- Prior experience scaling early-stage engineering teams in India.
- Experience with chaos engineering tools and practices.
Job ID: 141738783