Key Responsibilities
- Design and implement scalable data platforms leveraging Data Lake, Lakehouse, and Data Mesh architectures
- Build and optimize data pipelines for batch and real-time processing using tools like Databricks, Spark, DBT, and cloud-native services·
- Develop robust data ingestion frameworks for structured, semi-structured, and unstructured data (APIs, files, streaming sources)
- Design and develop scalable Python-based micro services to enable secure and efficient data sharing across systems and applications
- Build RESTful APIs and/or event-driven services for exposing curated datasets from Data Lake/Lakehouse platforms
- Work extensively with Python, PySpark, and SQL for data transformation and processing
- Implement streaming pipelines using Kafka / Kinesis and integrate with downstream analytics systems
- Design and manage large-scale datasets using formats such as Parquet, JSON, CSV, and sensor/IoT data
- Optimize data storage, partitioning, and query performance for high-volume analytical workloads· Collaborate with cross-functional teams (Data Architects, Analysts, BI teams) to operationalize data lake solutions
- Contribute to data modeling in Lakehouse environments (medallion architecture, dimensional modeling)
- Ensure data quality, reliability, and observability across pipelines
- Implement serverless data processing solutions where applicable
Required Skills & Experience
- 4–6 years of experience in Data Engineering or related roles
- Strong expertise in Python, PySpark, and SQL
- Hands-on experience with Databricks, Delta Lake, or Snowflake
- Experience with ETL / ELT frameworks and orchestration tools
- Practical exposure to streaming technologies like Kafka or Kinesis
- Strong understanding of batch processing frameworks such as Spark, AWS Glue, or DBT
- Proficiency in handling structured, semi-structured, and unstructured data
- Experience with modern data modeling techniques (Star Schema, Snowflake Schema, 3NF)
- Good understanding of Data Lakehouse concepts and implementation patterns
- Familiarity with serverless architectures (e.g., Python-based serverless pipelines)