Key Roles & Responsibilities:
- System Monitoring: Continuously monitor ETL jobs, workflows, and data pipelines for performance and reliability.
- Job Scheduling: Ensure jobs are scheduled appropriately and are running as per the defined schedules.
- Issue Resolution: Troubleshoot and resolve issues related to data integration, ETL jobs, and workflows promptly.
- Root Cause Analysis: Perform root cause analysis for recurring issues and implement preventive measures.
- Tuning: Optimize ETL jobs and workflows for better performance and efficiency.
- Resource Management: Ensure optimal use of system resources like CPU, memory, and storage.
- Data Validation: Ensure data accuracy, completeness, and consistency across different systems.
- Error Handling: Implement robust error handling and recovery mechanisms in ETL processes.
- Data Reconciliation: Perform data reconciliation to verify that the data loaded into the target systems matches the source data.
- Code Deployment: Deploy ETL code and configurations to production environments following change management procedures.
- Automation: Develop scripts and automation tools to streamline routine tasks and processes.
- Cross-Functional Teams: Work closely with development teams, DBAs, system administrators, and business users to resolve issues and implement changes.
- Status Reporting: Provide regular updates and reports on system status, performance, and incidents to stakeholders.
- Documentation: Maintain up-to-date documentation for ETL jobs, workflows, processes, and standard operating procedures.
Technical Requirements:
- Advanced working skills and knowledge with both RDBMS and SQL, with established command and working with a variety of databases (both SQL and NoSQL).
- Experience building and optimizing big data data pipelines architectures and data sets; both batch-oriented and real time.
- Extensive experience working with Big Data tools and building data solutions for advanced analytics eg Cloudera, Big query, Data proc etc.
- Practical knowledge across ETL tools (Informatica, Talend) as well as more recent big data tools,
- Experience with the following tools and technologies:
- Big data stack, including Hadoop, Spark, Kafka, Informatica
- Relational SQL and NoSQL databases
- Data pipeline/workflow management and quality tools.
- Real-time data injection
- Object-oriented, procedural and functional programming languages such as Python, Java, C++, Scala, etc
Qualifications
- Education: Bachelor's degree in Computer Science or equivalent; Masters preferred
- Experience: 3-5 years of experience in a similar Data Engineer role, Data Platforms and Design.
- Ability to work and communicate collectively with business.
- Excellent analytical and problem solving, conflict resolution, communication and Interpersonal skills.