Job Summary:
We are seeking an experienced Principal Data Engineer to help build the next generation of our identity graph and data platform. This role is focused on designing, developing, and optimizing large-scale data pipelines and systems that ingest, process, and unify complex datasets from diverse sources (web, mobile, AdTech, government, and proprietary data).
This is a highly hands-on, technical role for someone who can quickly understand existing systems, operate independently, and deliver high-quality solutions at scale. The ideal candidate is deeply analytical, detail-oriented, and experienced with building performant data pipelines and systems handling billions of records.
Key Responsibilities:
- Design, build, and optimize scalable data pipelines and ETL/ELT workflows for large, complex datasets
- Design and implement foundational data architecture supporting identity resolution and ID graph systems
- Develop and enhance systems supporting identity resolution and ID graph construction (data ingestion, normalization, matching, and deduplication)
- Process and unify multi-source datasets (cookies, device IDs, behavioral data, third-party and proprietary data)
- Write efficient, testable, and maintainable code using Python and SQL for large-scale data processing
- Optimize data models, queries, and storage strategies for performance, scalability, and cost efficiency
- Build and maintain data validation, monitoring, and alerting systems to ensure data quality and reliability
- Troubleshoot, debug, and improve existing data pipelines and infrastructure
- Own and drive complex data problems end-to-end, from initial design through production deployment
- Make and influence key technical decisions related to data architecture, scalability, and system design
- Collaborate with data, platform, DevOps, and product teams to deliver scalable, production ready solutions
- Translate business and product requirements into practical, performant data solutions
- Document data pipelines, systems, and workflows clearly
- Continuously improve system performance, data quality, and pipeline resilience
- Contribute to building new capabilities that improve how customers understand and leverage data insights.
Key Skills:
- 8–12+ years of hands-on experience in data engineering or large-scale data processing.
- Proven experience building and maintaining production-grade data pipelines and distributed systems
- Demonstrated experience architecting and delivering large-scale data platforms or mission critical data systems
- Strong expertise in: o SQL and relational databases (Postgres, BigQuery, Redshift, etc.) o Python for data processing and analysis
- Experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Cloud Functions) and/or AWS (S3, Redshift, EMR, RDS)
- Experience working with large-scale datasets (hundreds of millions to billions of records)
- Strong understanding of data modeling, partitioning, indexing, and query optimization.
- Experience with distributed data processing and parallelization techniques. • Experience moving large volumes of data across systems and architectures
- Familiarity with CI/CD, containerization, and orchestration tools (Docker, Kubernetes, GitHub Actions, etc.)
- Strong debugging and troubleshooting skills in complex data environments
- Experience with version control (Git) and Agile tools (Jira, Confluence, etc.).
- Highly analytical with strong attention to detail and a data-driven mindset.
- Ability to hit the ground running, quickly understand systems, and deliver independently
- Comfortable working in a remote, fast-paced, and collaborative environment
- Proven ability to drive system design and implementation.
Preferred:
- Experience with identity graphs, entity resolution, or record linkage systems.
- Background in AdTech, digital identity, cookies, or audience data platforms
- Experience with real-time or streaming data systems.
- Familiarity with data quality, observability, and monitoring frameworks
- Experience with data visualization tools (Looker, Tableau, Power BI)
- Knowledge of data privacy, compliance, and governance considerations
- Experience with modern data platforms such as Snowflake and Databricks
- Exposure to AI/ML technologies, including experience working with or integrating agentic frameworks.