We are looking for a highly motivated and expert Senior Data Engineer who can own the design & development of complex data pipelines, solutions, and frameworks. The ideal candidate will be responsible for designing, developing, and optimizing data pipelines, data integration frameworks, and metadata-driven architectures that enable seamless data access and analytics. This role requires deep expertise in big data processing, distributed computing, data modeling, and governance frameworks to support self-service analytics, AI-driven insights, and enterprise-wide data management.
Roles & Responsibilities
- Design, develop, and maintain scalable ETL/ELT pipelines to support structured, semi-structured, and unstructured data processing across the Enterprise Data Fabric.
- Implement real-time and batch data processing solutions, integrating data from multiple sources into a unified, governed data fabric architecture.
- Optimize big data processing frameworks using Apache Spark, Hadoop, or similar distributed computing technologies to ensure high availability and cost efficiency.
- Work with metadata management and data lineage tracking tools to enable enterprise-wide data discovery and governance.
- Ensure data security, compliance, and role-based access control (RBAC) across data environments.
- Optimize query performance, indexing strategies, partitioning, and caching for large-scale datasets.
- Develop CI/CD pipelines for automated data pipeline deployments, version control, and monitoring.
- Implement data virtualization techniques to provide seamless access to data across multiple storage systems.
- Collaborate with cross-functional teams, including data architects, business analysts, and DevOps teams, to align data engineering strategies with enterprise goals.
- Stay up-to-date with emerging data technologies and best practices, ensuring continuous improvement of Enterprise Data Fabric architectures.
Functional Skills
Must-Have Skills
- Hands-on experience in data engineering technologies such as Databricks, PySpark, SparkSQL, Apache Spark, AWS, Python, SQL, and Scaled Agile methodologies.
- Proficiency in workflow orchestration and performance tuning on big data processing.
- Strong understanding of AWS services.
- Experience with Data Fabric, Data Mesh, or similar enterprise-wide data architectures.
- Ability to quickly learn, adapt, and apply new technologies.
- Strong problem-solving and analytical skills.
- Excellent communication and teamwork skills.
- Experience with Scaled Agile Framework (SAFe), Agile delivery practices, and DevOps practices.
Good-to-Have Skills
- Deep expertise in Biotech & Pharma industries.
- Experience in writing APIs to make the data available to the consumers.
- Experienced with SQL/NoSQL databases, and vector databases for large language models.
- Experienced with data modeling and performance tuning for both OLAP and OLTP databases.
- Experienced with software engineering best-practices, including but not limited to version control (Git, Subversion, etc.), CI/CD (Jenkins, Maven, etc.), automated unit testing, and DevOps.
Education and Professional Certifications
- 9 to 12 years of Computer Science, IT, or related field experience.
- AWS Certified Data Engineer (Preferred).
- Databricks Certificate (Preferred).
- Scaled Agile SAFe certification (Preferred).
Soft Skills
- Excellent analytical and troubleshooting skills.
- Strong verbal and written communication skills.
- Ability to work effectively with global, virtual teams.
- High degree of initiative and self-motivation.
- Ability to manage multiple priorities successfully.
- Team-oriented, with a focus on achieving team goals.
- Ability to learn quickly, be organized, and detail-oriented.
- Strong presentation and public speaking skills.