About The Opportunity
We are a leading technology services provider operating within the IT and data analytics sector, focusing on delivering innovative data engineering solutions to a global clientele. Our company specializes in building robust data platforms, integrating diverse data sources, and enabling data-driven decision-making across industries.
Role & Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark to support business intelligence and analytics applications.
- Collaborate with cross-functional teams to understand data requirements and translate them into efficient ETL processes.
- Optimize Spark jobs for performance and reliability in a production environment.
- Assist in data modeling, schema design, and data governance practices to ensure data quality and integrity.
- Implement automation and monitoring solutions for data workflows using Apache Airflow and other tools.
- Document data architecture and processes, and provide technical support for troubleshooting and performance tuning.
Skills & Qualifications
- Must-Have
- Proficiency in PySpark for large-scale data processing
- Hands-on experience with Apache Spark and Hadoop ecosystem
- Strong SQL skills and experience with data warehousing concepts
- Experience with ETL pipeline development and automation tools like Airflow
- Knowledge of streaming platforms such as Kafka
- Understanding of data modeling, schema design, and data governance
- Excellent problem-solving and analytical skills
- Minimum of 5 years relevant experience in data engineering
- Preferred
- Experience with cloud platforms like AWS or Azure
- Knowledge of DevOps practices and CI/CD pipelines
Benefits & Culture Highlights
- Dynamic and collaborative work environment with ample learning opportunities
- Opportunity to work on cutting-edge data technologies
- Competitive compensation package and professional growth support
Skills: airflow,hive,apache spark,sql,kafka,python