Job Title: Data Engineer (Big Data & ETL)
Location- Gurgaon
1 year C2H
Exp Level- 4-8 years
Role Overview
We are looking for a highly skilled Data Engineer to design, build, and maintain scalable data pipelines. You will be responsible for leveraging Java to develop complex data transformations and utilizing GCP's suite of data tools to build robust Data Warehousing solutions that drive business intelligence and analytics.
Key Responsibilities
- Pipeline Development: Design and implement high-volume, low-latency ETL/ELT pipelines to ingest data from heterogeneous sources (structured and unstructured).
- Data Engineering: Develop and maintain data processing applications using Java to handle custom transformations and business logic.
- GCP Ecosystem: Architect solutions using GCP services such as BigQuery, Dataflow, Dataproc, and Pub/Sub.
- Data Modeling: Design efficient schemas and data warehouse structures in BigQuery to support analytical queries.
- Optimization: Optimize data processing jobs for performance, cost-efficiency, and scalability.
- Cloud Infrastructure: Utilize Infrastructure-as-Code (Terraform) to manage and deploy GCP data resources.
- Collaboration: Work closely with Data Scientists and Analysts to ensure data quality and accessibility.
Required Qualifications & Technical Skills
- Programming: Strong proficiency in Java (mandatory); experience with Python or Scala is a strong plus.
- Big Data Frameworks: Deep experience with Apache Beam (for Dataflow), Apache Spark, or Hadoop.
- GCP Expertise:
- BigQuery: Mastery in SQL, partitioning, clustering, and materialized views.
- Cloud Dataflow: Building and monitoring stream/batch pipelines.
- Cloud Storage: Managing data lakes/buckets.
- Data Warehousing: Solid understanding of dimensional modeling (Star/Snowflake schemas) and Data Vault.
- Workflow Orchestration: Experience with Cloud Composer (Airflow) for job scheduling and dependency management.
- Database Knowledge: Experience with both RDBMS (PostgreSQL/MySQL) and NoSQL databases.
Preferred Qualifications
- GCP Professional Data Engineer Certification.
- Experience with streaming data architectures (Kafka or Pub/Sub).
- Knowledge of CI/CD pipelines (GitHub Actions, Cloud Build).
- Experience in migrating legacy on-premise data warehouses to GCP.