Search by job, company or skills

Straive

Senior Data Engineer

Save
new job description bg glownew job description bg glownew job description bg svg
  • Posted a day ago
  • Be among the first 40 applicants
Early Applicant

Job Description

AI Data Engineer

Experience Level: 3–5 Years

Location: Straive Office Locations: Chennai, Banagalore, Hyderabad, Gurgoan, Noida, Pune, Mumbai

About Straive:

Straive is a market leading Content and Data Technology company providing data services, subject matter expertise, & technology solutions to multiple domains. Data Analytics & Al Solutions, Data Al Powered Operations and Education & Learning form the core pillars of the company's long-term vision. The company is a specialized solutions provider to business information providers in finance, insurance, legal, real estate, life sciences and logistics. Straive continues to be the leading content services provider to research and education publishers. Data Analytics & Al Services: Our Data Solutions business has become critical to our client's success. We use technology and Al with human experts-in loop to create data assets that our clients use to power their data products and their end customers workflows. As our clients expect us to become their future - fit Analytics and Al partner, they look to us for help in building data analytics and Al enterprise capabilities for them. With a client-base scoping 30 countries worldwide, Straive's multi-geographical resource pool is strategically located in eight countries - India, Philippines, USA, Nicaragua, Vietnam, United Kingdom, and the company headquarters in Singapore.

Role Overview

We are seeking an AI Data Engineer who thrives at the intersection of Data Engineering

and Autonomous AI. You will move beyond traditional ETL to build AI-Ready data pipelines

and Agentic systems. Your role is two-fold:

1. CoE Accelerator Development: Architect and build internal frameworks and

autonomous agents that automate complex data lifecycle tasks.

2. Client Delivery: Partner with clients to design and deploy sophisticated RAG

(Retrieval-Augmented Generation) systems and Agentic workflows that can reason,

plan, and execute data operations independently.

Key Responsibilities

● Agentic Workflow Development: Design and deploy autonomous agents (using

LangGraph, AutoGen, CrewAI) capable of orchestrating complex, multi-step data

tasks, such as self-healing pipelines, automated data quality remediation, or

autonomous SQL generation and execution.

● AI-Ready Data Pipelines: Architect robust pipelines using PySpark and Databricks

to transform data into high-quality vectors and knowledge graphs optimized for

Agentic memory and reasoning.

● Accelerators & Frameworks: Develop and maintain modular, reusable Data

Accelerators that standardize Agentic orchestration, evaluation, and cost-monitoring

for our CoE.

● Vector Database Management: Engineer, deploy, and manage vector indices (e.g.,

Databricks Vector Search, Pinecone) to serve as the long-term memory for AI

agents.

● LLMOps & Monitoring: Implement observability frameworks to track agent

performance, reasoning accuracy, and token costs. Integrate MLflow for experiment

tracking.

● Strategic Collaboration: Act as a subject matter expert for the Data Practice CoE,

contributing to technical whitepapers and the adoption of cutting-edge Agentic

architectures.

Technical Requirements

● Core Engineering: Expert-level proficiency in Python, PySpark, and SQL.

● Databricks Mastery: Hands-on expertise with the full Databricks ecosystem: Unity

Catalog, Delta Live Tables (DLT), Workflows, and Serverless compute.

● Agentic & AI Orchestration: Strong experience building RAG pipelines and

Agentic workflows using LangGraph, CrewAI, AutoGen, or LlamaIndex. This is

the key differentiator for this role.

● Vectorization & Embeddings: Understanding of embedding models, chunking

strategies, and the lifecycle of managing vector datasets for enterprise AI.

● Cloud Architecture: Familiarity with deploying AI-driven data solutions on AWS,

Azure, or GCP.

● Tools & Methodologies: Experience with CI/CD (Git/GitHub Actions),

containerization (Docker), and test-driven development.

Preferred Qualifications

● Agentic Expertise (Huge Plus): Demonstrable experience in building autonomous

agents that can troubleshoot, reason, or perform complex analytical tasks with

minimal human intervention.

● Certifications: Databricks Certified Data Engineer Professional, Azure/AWS AI

Engineer associate certifications.

● Full-Stack GenAI: Experience with frontend frameworks (Streamlit/Flask) to build

rapid prototypes/PoCs of data accelerators.

● Governance: Familiarity with data security, PII masking

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 147128081

Similar Jobs