Responsibilities may include the following and other duties may be assigned:
- Expertise in translating conceptual needs and business requirements into finalized architectural design.
- Able to manage large projects or processes that span across other collaborative teams both within and beyond Digital Technology.
- Operate autonomously to defines, describe, diagram and document the role and interaction of the high-level technological and human components that combine to provide cost effective and innovative solutions to meet evolving business needs.
- Promotes, guides and governs good architectural practice through the application of well-defined, proven technology and human interaction patterns and through architecture mentorship.
- Responsible for designing, developing, and maintaining scalable data pipelines, preferably using PySpark.
- Work with structured and unstructured data from various sources.
- Optimize and tune PySpark applications for performance and scalability.
- Deep experience supporting the full lifecycle management of the entire IT portfolio including the selection, appropriate usage, enhancement and replacement of information technology applications, infrastructure and services.
- Implement data quality checks and ensure data integrity.
- Monitor and troubleshoot data pipeline issues and ensure timely resolution.
- Document technical specifications and maintain comprehensive documentation for data pipelines.
- The ideal candidate is exposed to the fast-paced world of Big Data technology and has experience in building ETL/ELT data solutions using new and emerging technologies while maintaining stability of the platform.
Required Knowledge and Experience:
- Have strong programming knowledge in Java, Scala, or Python or PySpark, SQL.
- 4-8 years of experience in data engineering, with a focus on PySpark.
- Proficiency in Python and Spark, with strong coding and debugging skills.
- Have experience in designing and building Enterprise Data solutions on AWS Cloud or Azure, or Google Cloud Platform (GCP).
- Experience with big data technologies such as Hadoop, Hive, and Kafka.
- Strong knowledge of SQL and experience with relational databases (e.g., PostgreSQL, MySQL, SQL Server).
- Experience with data warehousing solutions like Redshift, Snowflake, Databricks or Google Big Query.
- Familiarity with data lake architectures and data storage solutions.
- Knowledge of CI/CD pipelines and version control systems (e.g., Git).
- Excellent problem-solving skills and the ability to troubleshoot complex issues.
- Strong communication and collaboration skills, with the ability to work effectively in a team environment.