
Search by job, company or skills
Job Description - Data Engineer
Location: Noida, India
Experience: Minimum 3 Years (Relevant and Hands-on)
About Codespire Solutions
Codespire Solutions is an enterprise AI product engineering organization building scalable SaaS platforms across manufacturing, legal tech, healthcare, and enterprise operations. Our work goes beyond experimentation - we focus on building production-ready systems used by real businesses. Our product ecosystem includes AI-driven platforms such as Smart RFQ AI, Supplier Match AI, Nyayra Law, and other enterprise automation tools designed with performance, security, and long-term scalability in mind.
Role Overview
We are looking for a strong Data Engineer with 3+ years of hands-on experience building production-grade Python-based data pipelines, working with observability and log data, and integrating multi-source enterprise systems into a centralized data lake.
This is not a junior or exploratory role. You will own the most critical layer of a live AI product: the data backbone. The AI is only as good as the data you feed it. You will work across a complex, multi-system environment and must be comfortable with high data volumes, strict compliance requirements, and evolving platform constraints.
What You'll Work On
Building Python-based automated data pipelines to collect and normalize operational data from logs, metrics, traces, ITSM systems, CI/CD tools, and knowledge repositories
Integrating with platforms such as Datadog, Splunk, ServiceNow, JIRA, Confluence, Jenkins, and SharePoint via REST APIs
Designing and managing Snowflake schemas, ingestion workflows, and query patterns for a centralized data lake
Building PGVector ingestion pipelines to support embedding-based retrieval for RAG-powered AI workflows
Implementing data normalization and summarization logic to comply with LLM input size constraints
Shipping audit-ready log structures to Splunk for compliance and traceability requirements
Writing rigorous unit tests with 90%+ code coverage across all pipeline modules
Working within CI/CD security pipelines (Nexus, Fortify, Sonar) and deploying to Kubernetes-based environments
Technology Environment
You'll be working across a modern data and AI engineering stack that includes:
Data & Pipelines:
Python 3.x, Snowflake, Apache Iceberg, PGVector / PostgreSQL, REST API integrations
Observability & Monitoring:
Datadog, Splunk, OpenTelemetry
Infrastructure & CI/CD:
AWS (SageMaker, S3, Lambda), Kubernetes, Docker, Jenkins, Nexus, Fortify, Sonar, Secrets Vault
AI & Search:
PGVector for vector search, OpenAI embedding models, RAG pipeline data preparation
Enterprise Systems:
ServiceNow, JIRA, Confluence, SharePoint
Who We're Looking For
3+ years of Python development for production data pipelines
Strong ETL/ELT skills: schema design, data transformation, incremental loads, and data quality checks
Hands-on Snowflake experience: ingestion, warehousing, and query optimization
Familiarity with observability data: reading logs, interpreting traces, understanding metrics such as CPU, memory, latency, and error rates
Experience integrating REST APIs from enterprise platforms (JIRA, ServiceNow, Confluence, or equivalent)
Strong unit testing discipline -- comfortable writing tests to 90%+ coverage
Comfortable with AWS cloud services and Kubernetes-based deployments
Good to Have
Experience with AIOps, incident management platforms, or observability tooling (Datadog, Splunk, PagerDuty)
Snowflake advanced features: Snowpark, Streams, Tasks, or Dynamic Tables
CI/CD security experience: Nexus, Fortify, SonarQube
Exposure to vector databases, RAG pipelines, or LLM data preprocessing
Prior work in regulated industries (banking, healthcare, insurance) with data governance requirements
AWS SageMaker experience for pipeline orchestration or ML experiments
What You Can Expect
Work on real, live AI products - not demo projects
Deep exposure to AIOps, observability-driven data engineering, and GenAI-integrated architectures
A technically rigorous environment where data quality, security, and reliability are non-negotiable
Direct mentorship and ownership from the earliest stages of the project
Clear growth path into the GenAI and RAG layer as the product matures
Job ID: 143915645