Caliber Hunt is seeking a skilled and experienced Data Engineer to join our team. The ideal candidate will have hands-on experience developing and optimizing scalable data pipelines for ingestion and transformation. This role is crucial for building a robust data infrastructure, working with cutting-edge technologies like PySpark and AWS cloud services, and collaborating with various teams to deliver high-quality, fault-tolerant solutions.
Responsibilities
- Develop fault-tolerant data pipelines running on a cluster.
- Write and optimize efficient SQL queries with Python and Hive for handling large datasets in Big-Data environments.
- Document technical design documents for given requirements or JIRA stories.
- Work closely with the overall Enterprise Data & Analytics Architect and Engineering leads to ensure adherence to best practices.
- Assure quality, security, and compliance requirements are met for the supported areas.
- Communicate results and business impacts of data initiatives to key stakeholders to collaboratively solve business problems.
- Debug, tune, and optimize PySpark data pipelines.
- Develop scalable and modular solutions.
- Coordinate with users, technical teams, and Data/Solution architects.
Required Skills & Qualifications
- 1-8 years of hands-on experience developing data pipelines for ingestion or transformation using Python (PySpark) / Spark SQL in AWS cloud.
- Advanced experience in writing and optimizing efficient SQL queries.
- Experience in development and processing of data at scale using technologies like EMR, Lambda, Glue, Athena, Redshift, and Step Functions.
- Experience with Git and CI/CD pipelines to deploy cloud applications.
- Strong understanding and implementation of PySpark data frames, joins, partitioning, and parallelism.
- Understanding of Spark UI, Event Timelines, DAG, and Spark config parameters for tuning pipelines.
- Experience in Data-Modelling, Big data, Hadoop, Hive, and ETL pipelines.
- Familiarity with IaC tools like Terraform.
- Experience working in Agile implementations.
- Good knowledge of designing Hive tables with partitioning for performance.
- Excellent communication skills to coordinate with various stakeholders.