We are looking for a lead data engineer to own our data integration track. You will not only design robust integration architectures but also lead the successful implementation of these systems by guiding a team of 2-3 engineers. This is a Player-Coach role: you are expected to remain hands-on with code and architecture while ensuring the team delivers high-quality data pipelines faster than meets strict SLAs
Today, you'll lead a team of about 2 engineers. Over the next 12-14 months, you'll grow that to a 5+ team. This is a player-coach role. You will guide the team in the design, development, and deployment of the pipelines and integrations, and it's an extremely hands-on role that requires you to also develop/validate/deploy components as we grow.
Technical Leadership And Implementation
The candidate will have responsibilities across the following functions
:
- Lead the Squad: Manage, mentor, and conduct code reviews for a team of 2-3 data engineers. Drive sprint planning, estimation, and task delegation to ensure successful delivery.
- Integration Architecture: Design scalable, fault-tolerant ETL/ELT frameworks to ingest complex data from diverse sources (REST APIs, streaming logs, CRM/ERP systems) into our central repository.
- Implementation Ownership: Take full accountability for the Implementation phase of the software lifecycle. Ensure that architectural designs are translated into functioning, production-grade code by the team.
Engineering And Optimisation
- Advanced Pipeline Development: Handle the most complex transformations and architectural challenges. Move beyond simple ingestion to building self-healing and idempotent pipelines.
- Performance Tuning: Write and optimise complex SQL queries and Python scripts. Identify bottlenecks in the data warehouse/lake and implement indexing, partitioning, or schema changes to improve performance.
- Code Quality Standards: Enforce version control best practices and CI/CD workflows, and run data validations within the team.
- AI/ML: Collaborate directly with data scientists and ML engineers to understand their feature requirements and build high-quality, production-ready pipelines. Engineer and manage the data infrastructure required for model training datasets, including versioning, lineage tracking, and compliance.
Reliability And Stakeholder Management
- SLA Management and RCA: Lead the resolution of critical incidents (P0/P1). Move beyond debugging to performing root cause analysis (RCA) to prevent recurrence and ensure customer SLAs are met.
- Data Quality Governance: Define the strategy for monitoring and alerting. Ensure the team implements automated checks for data accuracy, freshness, and completeness.
- Collaboration: Act as the technical point of contact for product managers and architects. Translate high-level business requirements into technical tickets for your team.
Requirements
- 5+ years of professional experience in data engineering.
- Minimum 2 years of experience leading, mentoring, or managing a small team (formal or informal).
- Must be willing to work extended hours (to overlap the US time zone).
- You will be the primary technical lead during these hours, ensuring unblocked development and rapid incident response.
- Still an active coder: You've shipped production-grade pipelines and orchestrated the flow in the last 6 months, not just managed people who did.
- Strong communication skills: You can explain complex technical decisions to non-technical stakeholders clearly.
Technical Competencies
- Database Mastery: Expert-level proficiency in SQL (PostgreSQL, ClickHouse, and MySQL) and good experience with data warehousing modelling (star/snowflake schemas and SCDs).
- Code Proficiency: Good programming skills in Python (Pandas, PySpark, async libraries).
- Orchestration and Integration: Hands-on experience with modern data stack tools is mandatory (e. g., Airflow, NiFi, etc. )
- Cloud Native: Proven experience implementing pipelines on hyperscalers (AWS, Azure, or GCP) using services like S3 Lambda/Functions, EMR, or Redshift/Synapse.
Soft Skills
- Delivery Focused: A mindset geared towards getting things done in the right and optimal manner. Your focus would be on shipping working code and enabling the team to deliver without compromising on the quality of deliverables.
- Communication: Ability to explain complex technical issues to non-technical stakeholders during US business hours.
Nice To Have
- Experience with Infrastructure as Code (Terraform, CloudFormation).
- Experience implementing data quality tools.
- Knowledge of containerisation (Docker, Kubernetes) for deploying data apps.
This job was posted by Saravana Kumar Rajasekaran from Terrantic.