
Search by job, company or skills
AI Data Engineer
Experience Level: 3–5 Years
Location: Straive Office Locations: Chennai, Banagalore, Hyderabad, Gurgoan, Noida, Pune, Mumbai
About Straive:
Straive is a market leading Content and Data Technology company providing data services, subject matter expertise, & technology solutions to multiple domains. Data Analytics & Al Solutions, Data Al Powered Operations and Education & Learning form the core pillars of the company's long-term vision. The company is a specialized solutions provider to business information providers in finance, insurance, legal, real estate, life sciences and logistics. Straive continues to be the leading content services provider to research and education publishers. Data Analytics & Al Services: Our Data Solutions business has become critical to our client's success. We use technology and Al with human experts-in loop to create data assets that our clients use to power their data products and their end customers workflows. As our clients expect us to become their future - fit Analytics and Al partner, they look to us for help in building data analytics and Al enterprise capabilities for them. With a client-base scoping 30 countries worldwide, Straive's multi-geographical resource pool is strategically located in eight countries - India, Philippines, USA, Nicaragua, Vietnam, United Kingdom, and the company headquarters in Singapore.
Role Overview
We are seeking an AI Data Engineer who thrives at the intersection of Data Engineering
and Autonomous AI. You will move beyond traditional ETL to build AI-Ready data pipelines
and Agentic systems. Your role is two-fold:
1. CoE Accelerator Development: Architect and build internal frameworks and
autonomous agents that automate complex data lifecycle tasks.
2. Client Delivery: Partner with clients to design and deploy sophisticated RAG
(Retrieval-Augmented Generation) systems and Agentic workflows that can reason,
plan, and execute data operations independently.
Key Responsibilities
● Agentic Workflow Development: Design and deploy autonomous agents (using
LangGraph, AutoGen, CrewAI) capable of orchestrating complex, multi-step data
tasks, such as self-healing pipelines, automated data quality remediation, or
autonomous SQL generation and execution.
● AI-Ready Data Pipelines: Architect robust pipelines using PySpark and Databricks
to transform data into high-quality vectors and knowledge graphs optimized for
Agentic memory and reasoning.
● Accelerators & Frameworks: Develop and maintain modular, reusable Data
Accelerators that standardize Agentic orchestration, evaluation, and cost-monitoring
for our CoE.
● Vector Database Management: Engineer, deploy, and manage vector indices (e.g.,
Databricks Vector Search, Pinecone) to serve as the long-term memory for AI
agents.
● LLMOps & Monitoring: Implement observability frameworks to track agent
performance, reasoning accuracy, and token costs. Integrate MLflow for experiment
tracking.
● Strategic Collaboration: Act as a subject matter expert for the Data Practice CoE,
contributing to technical whitepapers and the adoption of cutting-edge Agentic
architectures.
Technical Requirements
● Core Engineering: Expert-level proficiency in Python, PySpark, and SQL.
● Databricks Mastery: Hands-on expertise with the full Databricks ecosystem: Unity
Catalog, Delta Live Tables (DLT), Workflows, and Serverless compute.
● Agentic & AI Orchestration: Strong experience building RAG pipelines and
Agentic workflows using LangGraph, CrewAI, AutoGen, or LlamaIndex. This is
the key differentiator for this role.
● Vectorization & Embeddings: Understanding of embedding models, chunking
strategies, and the lifecycle of managing vector datasets for enterprise AI.
● Cloud Architecture: Familiarity with deploying AI-driven data solutions on AWS,
Azure, or GCP.
● Tools & Methodologies: Experience with CI/CD (Git/GitHub Actions),
containerization (Docker), and test-driven development.
Preferred Qualifications
● Agentic Expertise (Huge Plus): Demonstrable experience in building autonomous
agents that can troubleshoot, reason, or perform complex analytical tasks with
minimal human intervention.
● Certifications: Databricks Certified Data Engineer Professional, Azure/AWS AI
Engineer associate certifications.
● Full-Stack GenAI: Experience with frontend frameworks (Streamlit/Flask) to build
rapid prototypes/PoCs of data accelerators.
● Governance: Familiarity with data security, PII masking
Job ID: 147128081