Job Description:
You will
- Lead technical design and implementation of data engineering and MLOps solutions, ensuring best practices and high-quality deliverables.
- Mentor and guide junior engineers, conducting code reviews and technical sessions to foster team growth.
- Perform detailed analysis of raw data sources by applying business context and collaborate with cross-functional teams to transform raw data into curated & certified data assets for ML and BI use cases.
- Create scalable and trusted data pipelines which generate curated data assets in centralized data lake/data warehouse ecosystems.
- Monitor and troubleshoot data pipeline performance, identifying and resolving bottlenecks and issues.
- Extract text data from a variety of sources (documents, logs, databases, web scraping) to support development of NLP/LLM solutions.
- Collaborate with data science and data engineering teams to build scalable and reproducible machine learning pipelines for training and inference.
- Lead development and maintenance of end-to-end MLOps lifecycle to automate machine learning solutions development and delivery.
- Implement robust data drift and model monitoring frameworks across pipelines.
- Develop real-time data solutions by creating new API endpoints or streaming frameworks.
- Develop, test, and maintain robust tools, frameworks, and libraries that standardize and streamline the data & machine learning lifecycle.
- Leverage public/private APIs for extracting data and invoking functionalities as required for use cases.
- Collaborate with cross-functional teams of Data Science, Data Engineering, business units, and IT teams.
- Create and maintain effective documentation for projects and practices, ensuring transparency and effective team communication.
- Provide technical leadership and mentorship on continuous improvement in building reusable and scalable solutions.
- Contribute to enhancing strategy for advanced data & ML engineering practices and lead execution of key initiatives of technical strategy.
- Stay up-to-date with the latest trends in modern data engineering, machine learning & AI.
You have
- Bachelor's or Master's degree with 8+ years of experience in Computer Science, Data Science, Engineering, or a related field.
- 5+ years of experience working with Python, SQL, PySpark, and bash scripts. Proficient in software development lifecycle and software engineering practices.
- 3+ years of experience developing and maintaining robust data pipelines for both structured and unstructured data to be used by Data Scientists to build ML Models.
- 3+ years of experience working with Cloud Data Warehousing (Redshift, Snowflake, Databricks SQL or equivalent) platforms and distributed frameworks like Spark.
- 2+ years of hands-on experience using Databricks platform for data engineering and MLOps, including MLFlow, Model Registry, Databricks Workflow, Job Clusters, Databricks CLI, and Workspace.
- 2+ years of experience leading a team of engineers and a track record of delivering robust and scalable data solutions with highest quality.
- Solid understanding of machine learning lifecycle, data mining, and ETL techniques.
- Experience with machine learning frameworks (scikit-learn, xgboost, Keras, PyTorch) and operationalizing models in production.
- Proficiency in understanding REST APIs, experience using different types of APIs to extract data or perform functionalities.
- Familiarity with Pythonic API development frameworks like Flask/FastAPI and containerization frameworks like Docker/Kubernetes.
- Hands-on experience building and maintaining tools and libraries used by multiple teams across the organization (e.g., Data Engineering utility libraries, DQ Libraries).
- Proficient in understanding and incorporating software engineering principles in design & development process.
- Hands-on experience with CI/CD tools (e.g., Jenkins or equivalent), version control (Github, Bitbucket), orchestration (Airflow, Prefect or equivalent).
- Excellent communication skills and ability to work and collaborate with cross-functional teams across technology and business.
Good to have
- Understanding of Large Language Models (LLMs) and MLOps lifecycle for operationalizing LLM models.
- Familiarity with GPU compute for model training or inference.
- Familiarity with deep learning frameworks and deploying deep learning models for production use cases.
Location:
This position can be based in any of the following locations:
Chennai
Current Guardian Colleagues: Please apply through the internal Jobs Hub in Workday