Job Summary
We are seeking a Senior Data Architect with deep Big Data Engineering expertise to design and modernize large-scale, cloud-native data platforms. This role emphasizes distributed data processing, real-time pipelines, data platform automation, and GenAI enablement on top of strong Big Data foundations.
Key Responsibilities
- Architect and govern enterprise Big Data platforms (data lake, lakehouse, warehouse, real-time).
- Design high-volume, high-velocity data pipelines using batch and streaming frameworks.
- Lead implementation of distributed processing architectures (Spark, PySpark, EMR).
- Build event-driven and real-time streaming solutions (Kafka, Kinesis, Flink).
- Define ETL/ELT patterns, metadata-driven pipelines, and reusable ingestion frameworks.
- Drive data platform automation (Airflow/Step Functions, CI/CD, data quality, observability).
- Optimize performance, scalability, fault tolerance, and cost across Big Data workloads.
- Integrate GenAI architectures (LLMs, embeddings, vector databases, RAG) with enterprise data lakes.
- Ensure security, governance, lineage, and compliance across data platforms.
- Provide hands-on leadership and technical mentoring to data engineering teams.
Required Technical Skills & Experience
- 12+ years in Big Data Engineering / Data Architecture roles.
- Expert-level experience with Spark, PySpark, SQL, and distributed compute engines.
- Strong knowledge of AWS Big Data stack: S3, EMR, Glue, Athena, Redshift, Lambda, Step Functions.
- Hands-on experience with Snowflake (performance tuning, data sharing, optimization).
- Expertise in streaming platforms: Kafka, Kinesis, Flink, or Spark Streaming.
- Strong experience with data modeling (dimensional, Data Vault 2.0).
- Proficiency in Python, schema evolution, partitioning, and data versioning.
- Experience with orchestration and automation tools (Airflow, Dagster, CI/CD).
- Working knowledge of GenAI data integration (feature stores, vector DBs, RAG pipelines).
- Experience with Agile delivery and leading globally distributed engineering teams.