About the Company
We are looking for a hands-on Data Engineer with strong expertise in Python, PySpark, and cloud data services (AWS and/or Azure) to design, build, and optimize scalable data pipelines and lakehouse solutions. You'll work closely with data architects, analysts, and product teams to deliver high-quality, reliable, and secure data for analytics, AI/ML, and reporting use cases.
About the Role
We are looking for a hands-on Data Engineer with strong expertise in Python, PySpark, and cloud data services (AWS and/or Azure) to design, build, and optimize scalable data pipelines and lakehouse solutions. You'll work closely with data architects, analysts, and product teams to deliver high-quality, reliable, and secure data for analytics, AI/ML, and reporting use cases.
Responsibilities
- Design, build, and maintain batch and streaming data pipelines using PySpark/Spark and Python.
- Develop data lake/lakehouse architectures including Delta Lake/Iceberg/Hudi where applicable.
- Orchestrate pipelines using tools like Airflow, AWS Step Functions, Azure Data Factory, or Databricks Workflows.
- Build and optimize ETL/ELT workflows for large-scale datasets with a focus on performance, reliability, and cost.
- Implement data quality (DQ) checks, observability/monitoring, and error handling.
- Collaborate on data modeling (star/snowflake), CDC, SCD, and partitioning/bucketing strategies.
- Enforce security best practices: IAM/roles, encryption, secrets management, and data governance.
- Contribute to CI/CD for data code (e.g., Git, Azure DevOps, GitHub Actions, Jenkins) and infra-as-code (Terraform/CloudFormation/Bicep).
- Partner with stakeholders to translate business needs into scalable technical solutions; document designs and runbooks.
Qualifications
- Bachelor's or Master's degree in Computer Science, Information Systems, Engineering, or equivalent experience.
- Relevant certifications are a plus (e.g., AWS Data Analytics Specialty, AWS Developer/Architect, Azure Data Engineer Associate (DP-203), Databricks Data Engineer Associate/Professional).
Required Skills
- 58 years of professional experience as a Data Engineer or similar.
- Strong programming in Python (data processing, packaging, unit testing, typing).
- Advanced PySpark/Spark: RDD/DataFrame APIs, Spark SQL, performance tuning (joins, shuffle, partitions, broadcast, caching).
- Cloud (AWS and/or Azure) experience (at least one end-to-end project):
- AWS: S3, Glue, EMR, Lambda, Athena, Redshift (or Spectrum), Step Functions, IAM, CloudWatch/CloudTrail, Kinesis (nice to have).
- Azure: ADLS Gen2, Databricks, Synapse (Spark/SQL), ADF, Azure Functions, Event Hub, Key Vault, Purview (nice to have).
- Databricks (or Spark on EMR/Synapse): notebooks, jobs, clusters, Delta Lake, Unity Catalog (preferred).
- Data modeling & SQL (complex queries, performance optimization).
- Orchestration: Airflow/ADF/Step Functions/Databricks Jobs.
- Version control (Git) and CI/CD for data projects.
- Solid understanding of data quality, lineage, metadata, and observability concepts.
- Experience with cost optimization and security on cloud data platforms.
Preferred Skills
- Streaming: Spark Structured Streaming, Kafka/Event Hubs/Kinesis.
- Infra-as-Code: Terraform/CloudFormation/Bicep.
- Containers: Docker; basics of Kubernetes (AKS/EKS) a plus.
- Warehouse/Lakehouse: Redshift, Snowflake, Synapse SQL Pools.
- Testing: Great Expectations, dbt tests (if dbt is used), pytest.
- ML Pipelines: Feature engineering pipelines feeding ML (MLOps exposure is beneficial).
- Compliance/Governance: GDPR/PII handling, masking, tokenization.
Pay range and compensation package
Location: Domlur, Bangalore
Work Mode: 4 days WFO, 1 day WFH (as per policy/project needs)
Interview Availability: As per schedule shared by the TA team
Equal Opportunity Statement
We are committed to diversity and inclusivity.