Role: Engineering Manager, Data Science and Machine Learning
Location: Vashi
Shift: General
Role Overview
As an Engineering Manager, AI & ML (Data Collection), you will play a vital role in executing the company's AI and machine learning initiatives, with a strong focus on document ingestion and enrichment technologies. This position requires deep technical expertise in unstructured data processing, data collection pipeline engineering, and a hands-on approach to managing and mentoring engineers.
Your leadership will ensure that AI & ML data collection systems are developed and operationalized to the highest standards of performance, reliability, and security. You will work closely with individual contributors to ensure projects align with broader business goals and AI/ML strategies.
This role requires deep engagement in the design, development, and maintenance of document enrichment AI & ML models, solutions, architecture, and services. You will provide strong technical direction, solve complex technical challenges, and ensure the team consistently delivers high-quality, scalable solutions. You will leverage your deep knowledge in document understanding and enrichment, advanced natural language processing (NLP), OCR, entity extraction and enrichment, duplicate detection, generative AI (GenAI), large language models (LLMs), ML Operations (MLOps), data architecture, data pipelines, and cloud-managed services.
Your leadership will ensure AI/ML systems align with global business strategies while maintaining seamless integration and high performance. You will oversee the end-to-end lifecycle of AI/ML data systemsfrom research and development through deployment and operationalization.
You will mentor team members, resolve technical challenges, and foster a culture of innovation and collaboration, ensuring teams have the tools, frameworks, and guidance needed to succeed. This role offers a unique opportunity to drive impactful change in a fast-paced, dynamic environment, directly contributing to the success of global AI/ML initiatives.
Your ability to collaborate with cross-functional stakeholders, provide leadership across locations, set high standards, and hire, train, and retain exceptional talent will be foundational to your success. You will solicit feedback, engage others with empathy, inspire creative thinking, and help foster a culture of belonging, teamwork, and purpose.
Team Overview
You will lead a team of machine learning engineers responsible for building AI & ML solutions and services as part of robust document ingestion and enrichment pipelines handling large volumes of unstructured data. The team focuses on building scalable, reliable systems to process, enrich, and categorize data essential for downstream data collection and analytics.
Outline of Duties and Responsibilities
- AI & ML Data Collection Leadership: Drive the execution of AI & ML initiatives related to data collection, ensuring alignment with overall business goals and strategies.
- Document Enrichment Ownership: Own and evolve enrichment models for all incoming documents, including OCR, document structure extraction, entity extraction, entity resolution, and duplicate detection to ensure high-quality downstream data consumption.
- Technical Oversight: Provide hands-on technical leadership in the engineering of ML models and services, focusing on unstructured document processing, NLP, classifiers, and enrichment models. Oversee and contribute to scalable, reliable, and efficient solutions.
- Team Leadership & Development: Lead, mentor, and develop a high-performing team of engineers and data scientists. Foster a culture of innovation, continuous improvement, and effective communication across geographically dispersed teams.
- NLP Technologies: Contribute to the development and application of NLP techniques, including OCR post-processing, classifiers, transformers, LLMs, and other methodologies to process and enrich unstructured documents. Ensure seamless integration into the broader AI/ML infrastructure.
- Data Pipeline Engineering: Design, develop, and maintain advanced document ingestion and enrichment pipelines using orchestration, messaging, database, and data platform technologies. Ensure scalability, performance, and reliability.
- Cross-functional Collaboration: Work closely with other AI/ML teams, data collection engineering teams, and product management to ensure enrichment efforts support broader AI/ML and product objectives.
- Innovation & Continuous Improvement: Continuously explore and implement new technologies and methodologies to improve the efficiency, accuracy, and quality of document enrichment systems.
- System Integrity & Security: Ensure all data collection and enrichment systems meet high standards of integrity, security, and compliance. Implement best practices for data governance and model transparency.
- Talent Acquisition & Retention: Actively recruit, train, and retain top engineering talent. Foster an environment where team members feel valued, encouraged to innovate, and supported in reaching their full potential.
- Process Improvement: Apply Agile, Lean, and Fast-Flow principles to improve team efficiency and the delivery of high-quality data collection and enrichment solutions.
- Support Company Vision and Values: Model and promote behaviors aligned with the company's vision and values. Participate in company-wide initiatives and projects as required.
Experience, Skills, and Qualifications
- Bachelor's, Master's, or PhD in Computer Science, Mathematics, Data Science, or a related field.
- 6+ years of experience in software engineering, with a focus on AI & ML technologies, particularly in data collection and unstructured data processing.
- 3+ years of experience managing individual contributors in a leadership role.
- Strong expertise in NLP and machine learning applied to document understanding and enrichment, including classifiers, LLMs, GenAI, RAG, and/or Agentic AI.
- Hands-on experience building and operating document enrichment systems, including OCR, NER, entity resolution, document classification, and duplicate detection.
- Experience with data pipeline and messaging technologies such as Apache Kafka, Airflow, and cloud data platforms (e.g., Snowflake) is preferred.
- Proficiency in Python, Java, SQL, and other relevant programming languages and tools.
- Strong understanding of cloud-native technologies and containerization (e.g., Kubernetes, Docker), with experience managing these systems globally.
- Demonstrated ability to solve complex technical challenges and deliver scalable solutions.
- Excellent communication skills and a collaborative approach to working with global teams and stakeholders.
- Experience in fast-paced, data-intensive environments; fintech experience is highly desirable.
Working Conditions
This position is based in a standard office environment. Employees use PCs and phones throughout the day. Limited corporate travel may be required to remote offices, meetings, or events.
Morningstar is an equal opportunity employer.