Job Summary
We are looking for a skilled Data Engineer with strong experience in Google Cloud Platform (GCP) to design and build scalable data pipelines. The ideal candidate will have hands-on expertise in processing JSON data structures (including complex 3×3 nested formats) and developing efficient data ingestion and transformation workflows.
Key Responsibilities
- Design, develop, and maintain end-to-end data pipelines on GCP
- Ingest, parse, and transform JSON datasets (including structured 3×3/nested formats) into usable data models
- Build scalable ETL/ELT pipelines using tools such as Cloud Dataflow, Dataproc, or Composer
- Optimize data workflows for performance, cost, and reliability
- Work with structured and semi-structured data from multiple sources (APIs, files, streaming data)
- Implement data quality checks and validation mechanisms
- Collaborate with analytics and data science teams to enable downstream use cases
- Manage data storage solutions such as BigQuery, Cloud Storage, and Pub/Sub
- Ensure data security, governance, and compliance standards are met
Required Skills
- Strong experience with Google Cloud Platform (GCP) services
- Hands-on experience in building data pipelines (batch & streaming)
- Expertise in handling JSON data formats, including nested and matrix-like (3×3) structures
- Proficiency in SQL and Python (or Java/Scala)
- Experience with BigQuery for data warehousing and querying
- Familiarity with Cloud Dataflow / Apache Beam or similar frameworks
- Understanding of data modeling and ETL design patterns
Good to Have
- Experience with real-time streaming (Pub/Sub, Kafka)
- Knowledge of Airflow / Cloud Composer for orchestration
- Exposure to data lake architectures
- Understanding of CI/CD pipelines in data engineering
Key Competencies
- Strong problem-solving and analytical skills
- Ability to handle complex data structures (especially JSON transformations)
- Good communication and stakeholder management skills