About the Client
ARAs Client is a leading technology-driven organization focused on building scalable, data-centric solutions that power business intelligence and digital transformation. The company fosters innovation, collaboration, and continuous learning, enabling teams to work on cutting-edge data platforms and enterprise-scale systems.
Role Summary
We are seeking a highly skilled Data Engineer with strong PySpark expertise to design, build, and optimize large-scale data pipelines. This role involves working with distributed data systems, ensuring high data quality, and enabling seamless data flow across platforms. The ideal candidate will combine strong technical expertise with leadership capabilities to drive data engineering best practices.
Key Responsibilities
- Design, develop, and maintain scalable data pipelines using PySpark
- Build and optimize ETL processes for efficient data ingestion and transformation
- Ensure data quality, integrity, and governance across systems
- Collaborate with cross-functional teams to define data requirements and solutions
- Lead technical decision-making and contribute to architectural discussions
- Troubleshoot and optimize data workflows for performance and reliability
- Mentor junior engineers and promote knowledge sharing
- Ensure compliance with data governance and security standards
Must-Have Qualifications
- 5+ years of experience in Data Engineering
- Strong hands-on experience with PySpark
- Experience with distributed data processing frameworks
- Solid understanding of ETL processes and data integration techniques
- Experience with cloud-based data platforms (AWS / Azure / GCP)
- Strong problem-solving and debugging skills
Nice to Have
- Experience with data warehousing solutions
- Familiarity with workflow orchestration tools (e.g., Airflow)
- Knowledge of big data ecosystems (Hadoop, Hive, etc.)
- Exposure to real-time data processing frameworks.
Tier 2 locations preferred.