Key Responsibilities:
Cloud Data Platform & Architecture:
- Develop and maintain infrastructure-as-code using Terraform for Kubernetes clusters (AKS) and Databricks environments
- Architect data storage strategies considering processing, accessibility, availability, and cloud cost optimization
- Manage federated access to cloud resources based on organizational roles
- Implement shared access controls to support multi-tenancy and self-service tooling
Data Pipelines & Processing:
- Design, implement, and maintain data pipelines using Azure Data Factory, Amazon Data Pipeline, Apache Spark, Databricks
- Work with stream and batch processing systems like Kafka (Confluent/Open Source), Apache Storm, Spark Streaming, Apache Flink, and Kappa architecture
- Perform data transformation using KStreams App/KSQL/Processor Libraries
- Handle data ingestion and distribution using managed connectors such as Event Hubs, Kafka topics, ADLS2, REST APIs
Platform Automation & Orchestration:
- Set up and manage open-source stacks, including Airflow, Druid, Kafka (Open Source), OpenSearch, and Superset
- Utilize Python scripting for automation and integration tasks
- Build and deploy high-performance APIs using FastAPI
- Implement DevSecOps practices throughout the product lifecycle, including Day 2 operations with monitoring via Datadog
Collaboration & Innovation:
- Research, evaluate, and introduce new technologies to enhance platform capabilities
- Work closely with engineering teams under Agile/Scrum methodology
- Maintain and manage data catalog per topic/domain based on service use cases