Search by job, company or skills

  • Posted 2 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description - Data Engineer

Location: Noida, India

Experience: Minimum 3 Years (Relevant and Hands-on)

About Codespire Solutions

Codespire Solutions is an enterprise AI product engineering organization building scalable SaaS platforms across manufacturing, legal tech, healthcare, and enterprise operations. Our work goes beyond experimentation - we focus on building production-ready systems used by real businesses. Our product ecosystem includes AI-driven platforms such as Smart RFQ AI, Supplier Match AI, Nyayra Law, and other enterprise automation tools designed with performance, security, and long-term scalability in mind.

Role Overview

We are looking for a strong Data Engineer with 3+ years of hands-on experience building production-grade Python-based data pipelines, working with observability and log data, and integrating multi-source enterprise systems into a centralized data lake.

This is not a junior or exploratory role. You will own the most critical layer of a live AI product: the data backbone. The AI is only as good as the data you feed it. You will work across a complex, multi-system environment and must be comfortable with high data volumes, strict compliance requirements, and evolving platform constraints.

What You'll Work On

Building Python-based automated data pipelines to collect and normalize operational data from logs, metrics, traces, ITSM systems, CI/CD tools, and knowledge repositories

Integrating with platforms such as Datadog, Splunk, ServiceNow, JIRA, Confluence, Jenkins, and SharePoint via REST APIs

Designing and managing Snowflake schemas, ingestion workflows, and query patterns for a centralized data lake

Building PGVector ingestion pipelines to support embedding-based retrieval for RAG-powered AI workflows

Implementing data normalization and summarization logic to comply with LLM input size constraints

Shipping audit-ready log structures to Splunk for compliance and traceability requirements

Writing rigorous unit tests with 90%+ code coverage across all pipeline modules

Working within CI/CD security pipelines (Nexus, Fortify, Sonar) and deploying to Kubernetes-based environments

Technology Environment

You'll be working across a modern data and AI engineering stack that includes:

Data & Pipelines:

Python 3.x, Snowflake, Apache Iceberg, PGVector / PostgreSQL, REST API integrations

Observability & Monitoring:

Datadog, Splunk, OpenTelemetry

Infrastructure & CI/CD:

AWS (SageMaker, S3, Lambda), Kubernetes, Docker, Jenkins, Nexus, Fortify, Sonar, Secrets Vault

AI & Search:

PGVector for vector search, OpenAI embedding models, RAG pipeline data preparation

Enterprise Systems:

ServiceNow, JIRA, Confluence, SharePoint

Who We're Looking For

3+ years of Python development for production data pipelines

Strong ETL/ELT skills: schema design, data transformation, incremental loads, and data quality checks

Hands-on Snowflake experience: ingestion, warehousing, and query optimization

Familiarity with observability data: reading logs, interpreting traces, understanding metrics such as CPU, memory, latency, and error rates

Experience integrating REST APIs from enterprise platforms (JIRA, ServiceNow, Confluence, or equivalent)

Strong unit testing discipline -- comfortable writing tests to 90%+ coverage

Comfortable with AWS cloud services and Kubernetes-based deployments

Good to Have

Experience with AIOps, incident management platforms, or observability tooling (Datadog, Splunk, PagerDuty)

Snowflake advanced features: Snowpark, Streams, Tasks, or Dynamic Tables

CI/CD security experience: Nexus, Fortify, SonarQube

Exposure to vector databases, RAG pipelines, or LLM data preprocessing

Prior work in regulated industries (banking, healthcare, insurance) with data governance requirements

AWS SageMaker experience for pipeline orchestration or ML experiments

What You Can Expect

Work on real, live AI products - not demo projects

Deep exposure to AIOps, observability-driven data engineering, and GenAI-integrated architectures

A technically rigorous environment where data quality, security, and reliability are non-negotiable

Direct mentorship and ownership from the earliest stages of the project

Clear growth path into the GenAI and RAG layer as the product matures

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 143915645

Similar Jobs

Early Applicant