
Search by job, company or skills

Multiple roles for Senior Data Engineer/ Lead Data Engineer
Location: DLF Cyber City, Gurgaon
Work model: Hybrid
Shift: 1:00 PM - 9:30 PM IST
Position Summary:
You will play a pivotal role in the build and delivery of data products from simple to complex and supporting McCormick business units with their data and analytics needs.
Your responsibilities will include delivering and supporting data for existing analytics solutions, tooling, and solutions, researching new features and implementing automations. You will support business users, Data Scientists and Data Analysts to convert business expectations into data products and data models usable by business to deliver AI, analysis, reporting, and data-driven recommendations to stakeholders and executives.
This role will be accountable for building and maintaining scalable data pipelines from source systems. The Data Engineer will ensure the availability, reliability, and performance of data products by integrating raw data from various sources. Key responsibilities include data modeling, ETL (Extract, Transform, Load) development, and ensuring data quality and security.
Responsibilities:
Design and Execute
• Partner with data product managers to gather and deliver data pipelines.
• Design ETL solutions including data quality, data security, and data pipeline resiliency.
• Execute ETL solutions including data security, data quality and performance requirements.
Data Extraction, Load and Transformation
• Design, build and implement ELT pipelines to efficiently ingest and transform data from a wide variety of data sources and deliver datasets that meet business requirements.
• Optimize performance for large datasets and data workflows for performance, scalability, and reliability to support business needs.
• Develop and maintain scalable data pipelines leveraging Azure Synapse, PySpark, APIs, and SQL & performing advanced data cleaning, transformation, and manipulation to ensure high-quality, and reliable data flows.
• Implement CI/CD processes to streamline and automate data pipelines deployment.
• Apply data validation frameworks (Great Expectations, Fabric-native tools) to maintain accuracy.
• Utilize partitioning, indexing, clustering strategies to enhance query performance.
Process Improvement, Performance and Cost optimization tuning
• Collaborate with Data Science, AI, and Data product teams to optimize performance and cost effectiveness of their solutions.
• Identify and support the design of internal process improvements, including automating manual processes, optimizing data product delivery, and redesigning solutions for enhanced scalability.
• Implement solution adjustments to improve performance and cost-effectiveness of data products.
Desired key skills/ background of the candidate:
Job ID: 148996405
Skills:
Pyspark, Apache Spark, Automation, Data Quality, Gitlab, Databricks, Data Governance, Python, CI CD Pipelines, AI ML Workflows, LLMOps, RAG Pipelines, Vector-Space Architectures, Vector Search, SQL Optimization, metadata, Delta Lake, Spark Performance Optimization, Databricks REST APIs, Distributed Data Processing, Scalable Data Platform Architecture
Skills:
Java, S3, Storm, Hadoop, Cassandra, Pyspark, Scala, Kafka, Bash, Redis, Mapreduce, Spark Streaming, Hive, Spark, Python, Hbase, AWS, Flink
Skills:
Java, S3, Storm, Hadoop, Cassandra, Pyspark, Scala, Kafka, Bash, Mapreduce, Redis, Spark Streaming, Hive, Spark, Python, Hbase, AWS, Flink
We don’t charge any money for job offers