About the Role
We are looking for a curious and driven Data Engineer to join our team. In this role, you will architect and build scalable data lakehouse solutions across major cloud providers. You will move beyond simple script writing to engineer robust, production-grade pipelines using the Modern Data Stack
.You will act as both an individual contributor and a technical mentor, ensuring high code quality and leveraging AI-assisted development tools (like Cursor, Claude, or GitHub Copilot) to maximize efficiency and innovation
.
Key Responsibiliti
- esBig Data Engineering: Design, build, and maintain scalable ETL/ELT pipelines using PySpark and Advanced SQL to process massive dataset
- s.Platform Architecture: Architect and implement infrastructure on Databricks or Snowflake, leveraging cloud storage (S3/ADLS), serverless services, and modern data warehouse
- s.Transformation & Orchestration: Utilize DBT (Data Build Tool) for effective data transformation and manage job orchestration/schedulin
- g.Code Quality & Best Practices: Champion software engineering best practices, including version control (Git), writing comprehensive unit tests, and maintaining design/API documentatio
- n.AI-Augmented Development: Actively utilize AI coding assistants (Cursor, Copilot, etc.) to accelerate development cycles and improve code efficienc
- y.Collaboration & Review: Conduct rigorous peer code reviews for PySpark logic, ETL analytics, and Machine Learning integratio
- n.Mentorship: Lead and mentor junior developers, ensuring the team adheres to best coding practices and helping them grow into future leader
s.
Qualificati
onsMust-Have (Core Competencie
- s):Expert PySpark Proficiency: Deep experience processing large-scale data using Spark/PySpark. You understand how distributed computing works under the ho
- od.Advanced SQL: You can write complex, performant queries and understand database optimization deep
- ly.Python Scripting: Strong ability to write clean, modular, and efficient Python code for data engineering pipelin
- es.Platform Experience: Proven track record working within Databricks or Snowflake environmen
- ts.Data Modeling: Strong understanding of database systems, data modeling (Star schema, Snowflake schema), and data architectu
- re.Engineering Mindset: Experience with CI/CD, unit testing, and integrating with existing codebas
- es.AI Adaptability: Proficiency/Comfort with AI-enabled software development. You should be comfortable using IDEs with GenAI tools (Cursor, VS Code with Copilot, etc.) to iterate fast
er.Good-to-Have (Preferred Qualification
- s):
Experience with DLT Hub and orchestration platforms like Airflow/Pref - ect.Experience with Modern Data Stack (Fivetran, Airbyte, DBT, et
- c.).Familiarity with DuckDB for analytical process
- ing.Exposure to building applications using LLMs/GenAI (OpenAI SDK, Gemini, Anthro
pic)