Key Skills: AWS, Data Bricks, Pyspark
Roles and Responsibilities:
- Design, build, and maintain cloud-based data solutions using Databricks on AWS.
- Develop and optimize ETL pipelines using Python, PySpark, and SQL.
- Work with Delta Lake, Unity Catalog, and Databricks features for scalable data processing.
- Integrate AWS services such as S3, Lambda, SNS, Step Functions, and Glue into data workflows.
- Perform performance tuning and cost optimization for Spark and Databricks jobs.
- Support CI/CD pipelines and infrastructure automation using Terraform and GitHub.
- Conduct code reviews and troubleshoot Spark and data pipeline issues.
- Collaborate with business and technical teams to understand requirements and deliver solutions.
- Participate in Agile ceremonies and maintain technical documentation.
- Evaluate new Databricks features and contribute to proof-of-concept activities.
Skills Required:
- Strong experience in Databricks on AWS.
- Good knowledge of Python, PySpark, SQL, and ETL development.
- Experience with Delta Lake, Unity Catalog, and data warehouse concepts.
- Familiarity with AWS services including S3, Lambda, SNS, Step Functions, and Glue.
- Knowledge of CI/CD tools, Terraform, and GitHub.
- Understanding of Spark optimization and scalable data processing.
- Good analytical, troubleshooting, and communication skills.
- Ability to work in Agile and collaborative environments.
Education: Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.