Data Engineer

codvo.ai

Pune, India

5-7 Years

Save

Posted 5 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About us:

Codvo.ai is a next-gen AI and engineering company helping global enterprises transform through Generative AI, Cloud-native platforms, and Product Engineering. With proprietary platforms like NeIO and Pulse, we're enabling faster, smarter, and scalable digital transformation for industries including Energy, Retail, Travel, BFSI, and Healthcare.

As we gear up to launch new AI-powered products and expand global presence, we are seeking a marketing leader to define how Codvo.ai influences the market, shapes perception, and creates a movement.

Role Summary

Owns the data pipeline from BMS ingestion through to the analytics layer. Responsible for

data reliability, quality, and the real-time streaming infrastructure that feeds the ML models.

Responsibilities

Data Ingestion & Streaming

Build and maintain the BMS protocol bridge OPC-UA, BACnet, and MQTT connectors

Implement the MQTT ingestion pipeline topic subscription, message parsing, schema

validation, TimescaleDB insertion

Monitor ingestion health message rates, latency, dropped messages, reconnection

logic

Implement the tag auto-mapping engine pattern-based matching, confidence scoring,

manual override workflow

Build the historian adapter bulk data extraction from customer historian systems

(AVEVA Historian, OSIsoft PI, InfluxDB) for baseline profiling

Data Storage & Management

Design and maintain the TimescaleDB schema hypertables, continuous aggregates,

retention policies, compression

Implement data partitioning strategy per-tenant, per-site isolation

Build the training data snapshot pipeline versioned dataset extraction with SHA-256

checksums and manifest tracking

Manage the MinIO/S3 data lake dataset storage, MLflow artifact storage, backup

strategy

Implement data retention and archival policies per customer requirements

Data Quality & Monitoring

Build the data quality monitoring pipeline null rate tracking, stale timestamp

detection, out-of-range value flagging, schema violation alerting

Implement the tag freshness monitor detect when sensors stop reporting, alert on

stale data

Build the data lineage tracking system from raw BMS reading through feature

computation through model prediction

Monitor database performance query latency, storage growth, index health,

connection pool utilization

Integration

Build and maintain the CMMS integration ServiceNow / Maximo API connectors, work

order creation, status sync

Implement the feedback ingestion worker poll CMMS for work order outcomes,

match to predictions, update ground truth labels

Build the Prometheus/Grafana metrics export pipeline platform health metrics, data

quality dashboards

Expected Background

5+ years in data engineering ETL/ELT pipelines, streaming data, time-series databases

Strong Python Skills, Experience With PostgreSQL/TimescaleDB, MQTT, And Message

broker systems

Experience with Docker, Kubernetes, and CI/CD pipelines

Familiarity with OPC-UA, BACnet, or industrial data protocols is a strong advantage

Experience with data quality frameworks and monitoring

Note- Please apply via our official careers portal only, as applications sent directly to executives may not be considered.