Experience: 10+ Years of Experience
Job Description:
We are seeking a highly skilled Data Engineer to design, develop, and maintain robust data pipelines and AI-driven data products on Google Cloud Platform (GCP).
The ideal candidate will have expertise in BigQuery, Python, and cloud-native tools, along with hands-on experience in machine learning, retrieval-augmented generation (RAG), and data governance.
You will play a key role in supporting scalable, secure, and intelligent data systems that power business insights and AI applications.
Must Have Skills
- 10+ years of experience in data engineering, with at least 4+ years on Google Cloud Platform.
- Strong programming skills in Python and experience with cloud-based data engineering.
- Hands-on experience with GCP, especially BigQuery, Dataflow, Cloud Composer, and Cloud Functions.
- Familiarity with Vertex AI, BigQuery ML, Dialogflow, and AI model deployment.
- Proficient in building and maintaining ETL/ELT pipelines.
- Experience with CI/CD practices and tools in data workflows.
- Knowledge of retrieval-augmented generation (RAG), vector databases/search, prompt engineering is a plus.
- Solid understanding of data governance, privacy, and compliance standards.
- Strong SQL skills with a focus on BigQuery SQL.
- Experience with Spark or distributed data processing frameworks. (Good to have)
- Comfortable with API development, webhooks, and using tools like Postman.
- Excellent communication and documentation skills.
Good to have skills
- Familiarity with other Cloud technologies
- Working experience on JIRA and Agile
- Stakeholder communication
- Microsoft Office
- Cross functional team work internally and with external clients
- Team Lead
- Requirement gathering
Key Responsibilities:
- Data Pipeline Development:
- Build and optimize scalable ETL/ELT data pipelines using Dataflow, Cloud Composer, and BigQuery.
- Cloud Infrastructure:
- Design and deploy data solutions on Google Cloud Platform (GCP), utilizing services like Cloud Functions, Vertex AI, BigQuery, and GCS.
- Data Management:
- Ensure high data quality, governance, and privacy compliance across all stages of the data lifecycle.
- API Development:
- Design and consume APIs for data interaction and AI integration; use tools like Postman to test and document.
- Collaboration:
- Work closely with cross-functional teams, including data science, product, engineering, and compliance.
- AI & ML Integration:
- Collaborate with data scientists and ML engineers to deploy models using Vertex AI, support BigQuery ML, and integrate sentiment analysis and prompt engineering workflows.
- Conversational AI & RAG Systems:
- Support development of intelligent applications using Dialogflow, webhooks, vector search, and retrieval-augmented generation methods.
- Automation & CI/CD:
- Develop and manage CI/CD pipelines for data engineering workflows, ensuring reliability and reproducibility.
- Performance & Optimization:
- Optimize BigQuery SQL and Spark processes for cost and speed efficiency in handling large datasets.
Certification:
- GCP Professional Data Engineer/Architect Certification.
- Snowflake Associate / Core