Search by job, company or skills

Dun & Bradstreet

Principal Data Engineer

Save
  • Posted 2 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Summary:

We are seeking an experienced Principal Data Engineer to help build the next generation of our identity graph and data platform. This role is focused on designing, developing, and optimizing large-scale data pipelines and systems that ingest, process, and unify complex datasets from diverse sources (web, mobile, AdTech, government, and proprietary data).

This is a highly hands-on, technical role for someone who can quickly understand existing systems, operate independently, and deliver high-quality solutions at scale. The ideal candidate is deeply analytical, detail-oriented, and experienced with building performant data pipelines and systems handling billions of records.

Key Responsibilities:

  • Design, build, and optimize scalable data pipelines and ETL/ELT workflows for large, complex datasets
  • Design and implement foundational data architecture supporting identity resolution and ID graph systems
  • Develop and enhance systems supporting identity resolution and ID graph construction (data ingestion, normalization, matching, and deduplication)
  • Process and unify multi-source datasets (cookies, device IDs, behavioral data, third-party and proprietary data)
  • Write efficient, testable, and maintainable code using Python and SQL for large-scale data processing
  • Optimize data models, queries, and storage strategies for performance, scalability, and cost efficiency
  • Build and maintain data validation, monitoring, and alerting systems to ensure data quality and reliability
  • Troubleshoot, debug, and improve existing data pipelines and infrastructure
  • Own and drive complex data problems end-to-end, from initial design through production deployment
  • Make and influence key technical decisions related to data architecture, scalability, and system design
  • Collaborate with data, platform, DevOps, and product teams to deliver scalable, production ready solutions
  • Translate business and product requirements into practical, performant data solutions
  • Document data pipelines, systems, and workflows clearly
  • Continuously improve system performance, data quality, and pipeline resilience
  • Contribute to building new capabilities that improve how customers understand and leverage data insights.

Key Skills:

  • 8–12+ years of hands-on experience in data engineering or large-scale data processing.
  • Proven experience building and maintaining production-grade data pipelines and distributed systems
  • Demonstrated experience architecting and delivering large-scale data platforms or mission critical data systems
  • Strong expertise in: o SQL and relational databases (Postgres, BigQuery, Redshift, etc.) o Python for data processing and analysis
  • Experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Cloud Functions) and/or AWS (S3, Redshift, EMR, RDS)
  • Experience working with large-scale datasets (hundreds of millions to billions of records)
  • Strong understanding of data modeling, partitioning, indexing, and query optimization.
  • Experience with distributed data processing and parallelization techniques. • Experience moving large volumes of data across systems and architectures
  • Familiarity with CI/CD, containerization, and orchestration tools (Docker, Kubernetes, GitHub Actions, etc.)
  • Strong debugging and troubleshooting skills in complex data environments
  • Experience with version control (Git) and Agile tools (Jira, Confluence, etc.).
  • Highly analytical with strong attention to detail and a data-driven mindset.
  • Ability to hit the ground running, quickly understand systems, and deliver independently
  • Comfortable working in a remote, fast-paced, and collaborative environment
  • Proven ability to drive system design and implementation.

Preferred:

  • Experience with identity graphs, entity resolution, or record linkage systems.
  • Background in AdTech, digital identity, cookies, or audience data platforms
  • Experience with real-time or streaming data systems.
  • Familiarity with data quality, observability, and monitoring frameworks
  • Experience with data visualization tools (Looker, Tableau, Power BI)
  • Knowledge of data privacy, compliance, and governance considerations
  • Experience with modern data platforms such as Snowflake and Databricks
  • Exposure to AI/ML technologies, including experience working with or integrating agentic frameworks.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 149194435

Similar Jobs

Hyderabad, India

Skills:

MatplotlibScipyApache FlinkOracle Data IntegratorPower BiOracle DatabaseApache SparkKafkaTableauSqlApache AirflowNumpyPandasPrestoSeabornPythonOracle Autonomous Data WarehouseOracle Analytics Cloud

Hyderabad, India

Skills:

ApisGraphqlAdfSqlELTRESTSnaplogicAzurePythonEtlAzure DevOpsCI CDSynapseADLSMicrosoft Fabric

Hyderabad, India

Skills:

composer SparkDevSecOpsBddKafkaBigQueryRedisMySQLAtddPythonGcpTddAirflowSREXp

Hyderabad, India

Skills:

ApisSqlELTAzure DatabricksEtlAdvanced AnalyticsPythonData GovernanceApache SparkDevopsGitData Modeling

Hyderabad, India

Skills:

ELTSqlAzure DatabricksEtlPythonApache SparkScalaGit