AI/Machine Learning Engineer

The Hartford India

Hyderabad, India

5-7 Years

Save

Posted 2 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

This role is a collaborative partnership with Data Scientists and Data Engineers from the various Commercial Lines Data Science teams. It focuses on ensuring the health and reliability of Production deployed machine learning and AI models by monitoring data drift, data quality, and model performance. This is accomplished by managing incidents through ServiceNow as well as providing consultative feedback to model owners on observed performance issues.

The ideal candidate for this role would have:

Strong experience in Data Science and MLOps tech stack.
Extensive experience in building predictive ML and NLP solutions.
Expertise in designing and building Model Monitoring/Observability Frameworks and supporting solutions for interpretation and visualization.
Strong foundations in the Cloud Platform stack for building and hosting solutions.
Solid knowledge of CI/CD tech stack
Experience in building scalable monitoring solutions for both traditional ML and advanced AI systems

Key Responsibilities

Model Monitoring & Observability

Collaborate with Data Scientists to onboard models into monitoring systems (e.g., MHM database) and maintain configuration files for drift and performance checks.
Support retraining workflows when drift or degradation is detected.
Implement monitoring frameworks to track data drift, training-serving skew, and model performance metrics in production environments.
Utilize observability tools like Arize to set up monitors for accuracy, precision, recall, and other KPIs.
Configure automated alerts for anomalies in input/output distributions and performance degradation.

Data Quality Assurance

Perform QC checks on incoming data to validate integrity and completeness before model scoring.
Develop pipelines for continuous validation of data sources and ensure compliance with quality standards.

Performance Evaluation

Monitor KPIs such as accuracy, precision, recall, and fairness across demographic slices.
Conduct root-cause analysis for performance degradation and recommend retraining strategies.

Infrastructure & Automation

Build and maintain CI/CD pipelines for deploying monitoring solutions and model updates.
Leverage cloud technologies for scalable monitoring and orchestration.

Documentation & Reporting

Maintain detailed logs of monitoring activities, thresholds, and alerts.
Provide periodic reports on model health, including drift metrics and performance trends.

Required Skills & Experience:

Bachelor's or Master's degree in Computer Science, Engineering, or a closely related field; 5+ years of professional experience with a Bachelor's degree, or 3+ years of experience accepted with an advanced engineering degree combined with applied academic, internship, or handson project experience focused on machine learning, GenAI, or fullstack software development.

3+ years of experience collaborating with engineering leads and data science teams on the design, development, and scaling of machine learning systems, with a focus on reliability, scalability, and security.

4+ years of handson experience working with public cloud platforms such as Google Cloud Platform, AWS, and/or Azure to deploy, operate, and scale ML workloads.

2+ years of handson experience building Infrastructure as Code (IaC) using tools such as Terraform to support ML environments and pipelines.

5+ years of software development experience using programming languages such as Python, Java, or equivalent objectoriented or scripting languages, including productiongrade ML or data applications.

3+ years of experience working with machine learning and AI frameworks or platforms such as TensorFlow, scikitlearn, Anaconda, SageMaker, Vertex AI, or Agentic AI tooling.

3+ years of experience understanding model drift and data drift concepts and contributing to the design and implementation of monitoring, validation, and retraining strategies for ML and AI systems.

3+ years of working knowledge of CI/CD and MLOps practices, including automated testing, model validation, automated deployments, and integration pipelines.

3+ years of experience working in agile development environments, with SAFe experience required.

4+ years of experience collaborating and partnering with business leaders, engineering teams, and data science stakeholders to deliver machine learning solutions aligned to business outcomes.

Excellent written and verbal communication skills, with the ability to explain technical concepts clearly to diverse audiences.

Familiarity with agentic productivity and AI developer tools such as Claude Code, Gemini CLI, OpenAI Codex, or similar tools is a plus.

Experience enabling or supporting enterprise AI services such as Gemini Enterprise, Amazon Q, Microsoft Copilot, or similar platforms is a plus.

Strong analytical, critical thinking, and problemsolving abilities, with a demonstrated willingness to challenge the status quo, take ownership, and drive innovation and continuous improvement in AI platform development.

Ability to work both independently and collaboratively in a fastpaced, agile environment, demonstrating accountability, initiative, and a bias for action.

Nice to Have