Roles & Responsibilities:
- Architect and maintain robust, scalable data pipelines using Databricks, Spark, and Delta Lake for both batch and real-time data processing.
- Lead technology evaluation and adoption initiatives to improve productivity, scalability, and data delivery.
- Optimize data processing performance through Spark tuning, job scheduling, and efficient resource utilization.
- Develop innovative solutions to enhance data ingestion, transformation, lineage tracking, and observability.
- Build metadata-driven frameworks to promote pipeline consistency and reuse.
- Promote a culture of engineering excellence, continuous improvement, and experimentation.
- Collaborate with architecture, platform, governance, and analytics teams to support the enterprise data strategy.
- Define and monitor SLOs, KPIs, and data quality metrics for production systems.
- Translate business requirements into scalable, governed data products in partnership with stakeholders.
- Mentor and guide engineers to adopt modern engineering tools and practices.
- Work closely with DevOps, architects, and analysts to ensure alignment of engineering strategies with business objectives.
- Stay current on data technology trends and best practices to continually enhance the data platform architecture.
Must-Have Skills:
- Strong hands-on experience with Databricks, PySpark, SparkSQL, Apache Spark, AWS, Python, and SQL.
- Deep understanding of workflow orchestration, job performance tuning, and big data processing.
- Proficient with AWS services relevant to data engineering.
- Knowledge of enterprise-wide data architecture patterns such as Data Fabric or Data Mesh.
- Demonstrated ability to learn and apply new technologies quickly.
- Strong problem-solving, analytical, and teamwork skills.
- Experience with Scaled Agile Framework (SAFe), Agile delivery, and DevOps practices.
Good-to-Have Skills:
- Industry expertise in biotech or pharmaceutical sectors.
- Experience writing APIs to enable data access for consumers.
- Familiarity with SQL/NoSQL databases and vector databases for LLM use cases.
- Experience with OLAP and OLTP data modeling and performance tuning.
- Exposure to software engineering best practices, including Git, CI/CD pipelines (e.g., Jenkins, Maven), and DevOps automation.
Education & Certifications:
- 12 to 17 years of experience in Computer Science, Information Technology, or related field.
- AWS Certified Data Engineer (preferred)
- Databricks Certification (preferred)
- SAFe Certification (preferred)
Soft Skills:
- Excellent analytical and troubleshooting capabilities.
- Strong written and verbal communication skills.
- Able to work effectively in global, distributed teams.
- Highly self-motivated and proactive.
- Capable of managing multiple priorities simultaneously.
- Strong team player with a focus on collaboration and shared success.
- Quick learner with strong organizational and presentation skills.