Sr AI Analytics Data Engineer

The Hartford India

Hyderabad, India

5-7 Years

Save

Posted 3 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Key Responsibilities

Design & Deliver Conversational AI Solutions: Build advanced statistical, machine learning, and generative/agentic AI solutions, with a strong emphasis on conversational interfaces. This includes RAG pipelines, intelligent chat/assistant systems, classification, forecasting, and recommendation engines, utilizing a fit-for-purpose toolkit from traditional predictive modeling to sophisticated agentic workflows.
Business Intelligence & Semantic Layer Definition: Lead the definition and implementation of the semantic layer for critical insurance data. This involves meticulously designing fact tables, dimensions, and metrics to ensure data consistency, accuracy, and interpretability for AI agents and business intelligence tools. Focus on creating a unified and business-friendly view of complex insurance data.
Deep Research & Domain Expertise: Conduct in-depth research into complex insurance concepts, regulatory landscapes, and market dynamics to inform the design and development of AI models and the semantic layer. Leverage this understanding to create highly relevant and accurate AI-driven insights.
Regulatory Intelligence & Filing Automation: Design and deploy GenAI capabilities to automate regulatory filing support for the insurance industry, including DOI objection response generation and the ingestion of legacy filings into searchable knowledge bases. Partner closely with Legal and Compliance to ensure all outputs meet evolving standards and enable direct API integrations with regulatory bodies.
Knowledge Base Engineering for Strategic Domains: Engineer and maintain robust, domain-specific knowledge bases (e.g., regulatory intelligence, competitive insights, customer sentiment specific to insurance) to power generative applications across underwriting, pricing, and service. This includes structuring knowledge for optimal retrieval by conversational AI systems.
Domain & Compliance Integration: Develop a deep understanding of The Hartford's specific business structures, processes, and data sources within the insurance context. Embed domain taxonomies, regulatory constraints, access controls, and security directly into solution design and the semantic layer. Ensure adherence to responsible AI practices such as fairness, bias mitigation, transparency, and observability with compliance-by-design.
Unstructured Data & Retrieval Design: Prepare multi-format content (PDF, Office, HTML, images, audio) relevant to insurance with normalization, robust metadata/lineage management, and PII detection/redaction. Design advanced retrieval strategies (e.g., chunking, embeddings, hybrid search) tailored to insurance domain knowledge, and tune for cost, latency, and domain fit, leveraging re-rankers where appropriate for conversational AI.
Prompt & Agent Design: Author robust system prompts, few-shot patterns, and structured outputs (e.g., JSON schemas) for conversational AI agents. Define safe tool-use policies and function/structured calling for reliable and ethical agent behavior within the insurance context.
Evaluation & Monitoring: Define comprehensive metrics across use cases for classification, information retrieval, RAG/chat performance, forecasting, and critical customer/operational KPIs for the insurance business. Build gold/synthetic test sets, support A/B testing, and monitor for data and model drift, providing economic, qualitative, and statistical analysis to support thresholds and business decisions.
Synthetic Data Generation & Augmentation: Develop and validate synthetic data pipelines to alleviate sparsity and accelerate model convergence, especially for low-frequency perils and emerging segments within insurance, while rigorously preserving privacy and distributional fidelity.
Architectural Collaboration & MLOps Integration: Partner with enterprise architects and platform teams to ensure scalable, secure deployments via unified systems. Standardize experiment tracking, registries, evaluation gates, and CI/CD patterns across clouds and services for all AI and BI solutions.
Innovation & Continuous Learning: Identify and pilot emerging methods (e.g., OCR for insurance documents, advanced rerankers, PEFT/LoRA, distillation). Build reusable accelerators (e.g., chunking templates, prompt registries, evaluation harnesses). Stay current on the latest advancements in AI/ML, LLMOps, NLP, RAG, responsible AI, and Business Intelligence best practices.

Required Skills & Experience:

57 years of professional experience with a Bachelor's degree, or fewer than 5 years of experience with a Master's or Ph.D.; Master's degree or Ph.D. in Machine Learning, Applied Mathematics, Data Science, Computer Science, or a closely related analytical field preferred, or demonstrated progress toward a relevant professional designation.

5+ years of experience in statistical modeling, machine learning, and advanced analytics using Python, including pandas, NumPy, scikit-learn, and strong SQL for complex data exploration, feature engineering, and knowledge preparation; familiarity with PyTorch and/or TensorFlow preferred.

5+ years of experience across the end-to-end modeling and analytics lifecycle, including requirements gathering, experiment design, offline evaluation, and basic production monitoring and validation of AI and BI solutions.

5+ years of solid understanding and practical application of core machine learning, deep learning, and natural language processing algorithms and architectures.

4+ years of experience designing and implementing business intelligence and semantic layer solutions, including dimensional modeling, fact tables, metrics definition, and operating within data warehouse or data lake environments.

4+ years of experience designing and operationalizing evaluation and monitoring strategies, including test set creation (gold and/or synthetic), defining and tracking metrics such as classification, forecasting, ranking/IR, RAG faithfulness and truthfulness, and customer or operational KPIs, with support for A/B testing and drift detection.

3+ years of experience designing, developing, and deploying conversational AI solutions, including chatbots, virtual assistants, intelligent agents, and Retrieval-Augmented Generation (RAG) pipelines.

3+ years of experience working with unstructured data, including document parsing and OCR fundamentals, robust text normalization, metadata and lineage awareness, and PII detection or redaction considerations for insurance documents.

3+ years of experience working with cloud-based AI and analytics platforms such as Google Vertex AI, AWS SageMaker or Bedrock, or Azure AI Services, supporting experimentation, deployment, and conversational AI use cases.

5+ years of experience using Git and Unix-based environments, building reproducible notebooks or pipelines, and applying basic container and cloud concepts to support reliable analytics and AI workflows.

4+ years of experience communicating complex technical designs, trade-offs, evaluation results, and risks to both highly technical and non-technical business audiences, translating insights into clear business outcomes and strategy.

Foundational knowledge of insurance products, processes (including underwriting, claims, and pricing), and regulatory environments, with demonstrated ability to apply analytics and AI solutions within a regulated industry context.

2+ years of experience with advanced NLP and Generative AI capabilities, including embeddings, hybrid and dense retrieval strategies, advanced chunking, prompt engineering, structured outputs, and integrating knowledge graphs into RAG solutions for improved grounding.

1+ year of experience or exposure to advanced GenAI applications in insurance, such as compliance-aware prompt engineering, document generation, objection response automation, synthetic data generation for low-frequency events, and customer sentiment modeling from surveys, call transcripts, or inspection notes.

2+ years of experience working within enterprise AI governance frameworks, aligning conversational AI and analytics solutions with compliance, privacy, documentation, and ethical standards.

Nice to Have

RAG Expertise: Hands-on with vector databases and search (e.g., Vertex AI RAG Engine, OpenSearch, pgvector/Postgres), ANN indexing (HNSW), advanced rerankers (cross-encoders), and evaluation frameworks (RAGAS, TruLens, DeepEval) tailored for conversational AI in insurance.
Document AI Tooling: Practical experience with PyMuPDF/pdfplumber, Apache Tika; advanced OCR (Tesseract); layout-aware models (LayoutLM); and table extraction (Camelot/Tabula) for processing insurance documents.
Embedding Model Selection: Experience comparing and selecting embedding models (OpenAI/Cohere/Voyage vs. open-source like bge/e5/gte) for domain-specific insurance corpora; understanding dimension/quality/cost/latency trade-offs and multilingual needs.
Orchestration Frameworks: Familiarity with LangChain, LangGraph, or LlamaIndex; structured tool/function calling and guardrails for complex AI agents.
Cloud-Native ML/BI Platforms: Hands-on with Vertex AI, SageMaker, or Azure ML; experience tracking experiments (MLflow/W&B), registries, and CI evaluation gates for both AI and BI solutions.
Responsible AI & Safety: Expertise in bias/fairness testing, hallucination mitigation, grounding checks, safety filters; and comprehensive model risk documentation for conversational AI.
Broader Modalities (Nice to Have): Experience with time-series forecasting, recommenders, anomaly/fraud detection, speech/vision/multimodal within an insurance context.
LLM Fine-tuning: Experience fine-tuning LLMs and Diffusion models using PEFT/LoRA, and practical experience with model distillation.