The Data Engineering team within the CT org supports development of large-scale data pipelines for machine learning and analytical solutions related to unstructured and structured data. You'll have the opportunity to gain hands-on experience on all kinds of systems in the data platform ecosystem. Your work will have a direct impact on all applications that our millions of customers interact with every day: search results, homepage content, emails, auto-complete searches, browse pages and product carousels and build and scale data platforms that enable to measure the effectiveness of Wayfair's ad-costs , media attribution that helps to decide on day to day or major marketing spends.
About the Role:
As a Staff Data Engineer, you will be part of the Data Engineering team with this role being inherently multi-functional, and the ideal candidate will work with Client Experience, Data Scientist, Analysts, Application teams across the company, as well as all other Data Engineering squads at Wayfair. We are looking for someone with a love for data, handling ambiguous requirements and the ability to iterate quickly. Successful candidates will have strong engineering skills and communication and a belief that data-driven processes lead to phenomenal products.
What you'll do:
- Drive the end-to-end design and evolution of data models, pipelines, and data products for Search, Recommendations, and Marketing—operating at scale and influencing multiple domains.
- Own the development of scalable, batch-first data systems that ingest and transform structured, semi-structured, and unstructured data into high-quality, AI-consumable representations (e.g., curated datasets, embeddings-ready data, feature layers, and semantic abstractions).
- Design and build high-fidelity data pipelines optimized for reliability, cost, and performance, with a focus on efficient retrieval, data freshness, and contextual usability for downstream systems.
- Build and maintain robust data models including fact/dimension models, SCDs, CDC pipelines, and data versioning strategies to ensure consistency and reproducibility.
- Contribute to the development of a unified semantic layer that bridges raw data and AI/ML systems, enabling standardized metrics, reusable data definitions, and improved data access patterns.
- Work with metadata, lineage, and data discovery frameworks to improve transparency, governance, and usability of data across the organization.
- Partner cross-functionally with Product, Analytics, and Data Science to translate ambiguous business problems into well-defined data solutions and reusable data assets.
- Define and enforce data modeling standards, data contracts, and quality frameworks across teams.
- Drive improvements in data observability, SLA/SLO adherence, and pipeline reliability across the ecosystem.
- Make architectural decisions and trade-offs across storage, compute, and orchestration layers within a GCP-native stack.
What You'll Need:
- Bachelor's/Master's degree in Computer Science or related field, or equivalent experience.
- 12 years of experience in Data Engineering, building and owning large-scale data platforms and datasets at scale.
- Deep expertise in data modeling (dimensional models, SCDs, wide tables), along with strong understanding of CDC, data versioning, and incremental processing strategies.
- Strong experience designing and building data pipelines on Google Cloud Platform using Google BigQuery and Google Cloud Storage.
- Advanced proficiency in SQL and Python, with a strong focus on query optimization, cost efficiency, and large-scale data processing.
- Solid understanding of data lakehouse principles, storage formats (e.g., Parquet), partitioning, clustering, and performance tuning.
- Experience building reliable, production-grade data systems, including ingestion, transformation, serving layers, and strong data quality and observability practices (SLAs/SLOs).
- Experience with event-driven and streaming architectures (e.g., Pub/Sub, Kafka), with the ability to apply them pragmatically alongside where needed.
- Experience enabling AI/ML use cases from a data perspective, including preparing high-quality datasets for model consumption, supporting feature engineering workflows, and building semantic or context-rich data layers that improve downstream usability.
- Familiarity with concepts such as metadata management, data lineage, and data discovery, and their role in improving trust and usability of data platforms.
- Proven ability to translate ambiguous business requirements into scalable data models and systems, especially in domains like search, recommendations, or marketing analytics.
- Demonstrated ownership of large problem spaces end-to-end, with the ability to influence architecture, drive standards, and align multiple teams.
- Experience providing technical leadership and mentorship, setting best practices, and raising the bar for engineering quality.
- Strong communication skills with the ability to articulate technical decisions, trade-offs, and system designs to diverse stakeholders.
Good to have :
- Understanding of NoSQL Database exposure.
- Familiarity with Bl tools like Looker, Tableau, AtScale, PowerBI, or any similar tools.