Senior Data Engineer

Opkey

Noida, India

4-6 Years

Save

Posted 2 days ago
Be among the first 20 applicants

Early Applicant

Job Description

Opkey | Series B Funded | Noida, India (In-Office) | Full-Time

The Opportunity

Opkey, a Series B funded enterprise application lifecycle management platform, is looking

for a Senior Data Engineer to join our team in Noida. We need someone who can build and

scale the data infrastructurepipelines, storage systems, and processing enginesthat

powers our platform.

We're not pitching a visionwe're scaling a reality. Our systems already process hundreds

of gigabytes of enterprise data. Now we need an engineer who can make that infrastructure

handle 10x more, 10x faster, with bulletproof reliability. This is your chance to be part of

building something that will define a category.

About Us

Opkey is redefining how enterprises manage the lifecycle of their most critical applications.

We've built the platform that takes organizations from Design to Configure to Test to Train,

powered by agentic AI.

Our customers already include Fortune 500 companies and top global system

integrators. They trust us with hundreds of gigabytes of their most sensitive enterprise

datapayroll files, configuration exports, test resultsbecause we've proven we can handle

it.

We're already doing what others are only talking about. Our pipelines already process

massive payroll files in real-time. Our systems already normalize chaotic enterprise data

formats into clean, queryable structures. Our infrastructure already powers AI and analytics

that enterprises depend on.

Now we're scaling. And we need exceptional people to help us go from category creator to

category leader.

This is founder mode, not corporate mode. We move fast, we solve hard problems, and

we ship things that matter.

Why This Role Matters

Data scientists can't build models on broken pipelines. Analysts can't find insights in dirty

data. The entire intelligence layer of our platform depends on rock-solid data infrastructure.

You'll build the foundation everything else depends on.

You'll design the pipelines that ingest data from dozens of enterprise formats. You'll build the

systems that diff millions of records in seconds. You'll create the infrastructure that lets our

data scientists focus on algorithms instead of wrestling with data quality.

When a Fortune 500 company validates their payroll migration, your infrastructure makes

that possible. When our ML models predict configuration failures, they're running on

pipelines you built.

This is already happening at Opkey. You'll help us scale it to the world.

What You'll Do

You'll join a team that's already built production data infrastructure handling enterprise-scale

workloads. Your job is to make it faster, more reliable, and ready for the next order of

Magnitude

Build & Optimize Data Pipelines: Design and implement ETL/ELT pipelines that

ingest data from diverse enterprise sourcesExcel files, CSVs, API exports,

database extracts, proprietary formatsand transform it into clean, queryable

structures.

Design High-Performance Comparison Engines: Build systems that diff massive

datasetspayroll files with millions of records, configuration exports with thousands

of parametersand surface differences in real-time.

Architect Scalable Data Storage: Design and manage data warehouses, data

lakes, and databases that handle terabytes of enterprise data. Make decisions about

partitioning, indexing, and storage formats.

Ensure Data Quality & Reliability: Implement validation, monitoring, and alerting

systems that catch data issues before they affect downstream consumers. Build self

healing, observable pipelines.

Enable Analytics & ML Teams: Partner with data scientists to build the

infrastructure they needfeature stores, training data pipelines, model serving

infrastructure.

Scale for Growth: Design systems that can handle 10x the data without 10x the cost

or complexity. Think ahead about bottlenecks and architect around them.

Skills & Qualifications

Required Technical Skills

Python for Data Engineering: 4+ years of production experience writing clean,

maintainable, performant Python code for data processing and pipeline development

SQL Mastery: Expert-level SQLcomplex queries, query optimization,

understanding execution plans. You can look at a slow query and know how to fix it.

Data Pipeline Development: Hands-on experience building ETL/ELT pipelines that

run reliably in production. You've designed pipelines that process millions of records

without failing.

Distributed Computing: Deep knowledge of frameworks like Apache Spark for

large-scale data processing. You understand partitioning strategies, shuffle

optimization, and memory management.

Data Modeling & Warehousing: Strong foundation in data modelingstar

schemas, slowly changing dimensions, normalization vs. denormalization tradeoffs.

Database Technologies: Experience with relational databases (PostgreSQL,

MySQL) and data warehouses (Redshift, Snowflake, BigQuery). You know when to

use each.

Nice to Have

Experience with streaming data systems (Kafka, Kinesis)
Cloud platform expertise (AWS, GCP, Azure)
Knowledge of orchestration tools (Airflow, Dagster, Prefect)
Background in data comparison/diffing algorithms
Experience with containerization (Docker, Kubernetes)
Exposure to enterprise data formats and systems

Mindset & Approach

Reliability-Obsessed: You've been paged at 2am, and you've built systems that

don't page you at 2am. You understand what it takes to run production infrastructure.

Systems Thinker: You see how individual components fit into the larger

architecture. You make tradeoffs that optimize for the whole system.

Ownership Mentality: You don't treat data quality as someone else's problem. You

own the pipeline end-to-endfrom ingestion to the data scientist's query.

Pragmatic Engineer: You know when to build for flexibility and when to optimize for

performance. You don't chase shiny tools when proven ones work better.

Founder Mentality: You thrive in ambiguity, make architectural decisions with

incomplete information, and care about outcomes over perfect documentation.

What We're NOT Looking For

Engineers who only want to work with cutting-edge tools regardless of fit
People who treat data quality as someone else's problem
Those who need a detailed roadmap handed to them
Candidates who've never owned production systems end-to-end

What We Offer

Competitive salary + meaningful equity in a company that's already winning
The chance to architect data infrastructure that Fortune 500 companies depend on
A team that values speed, ownership, and results over politics
Direct impactyour pipelines will process enterprise data at a scale most engineers

never see

The opportunity to be part of historybuilding the data foundation that powers

how enterprises manage their most critical applications

We've proven our infrastructure works. Now we need someone to scale it to

the world.

Apply with your resume and a brief note about the most challenging data pipeline you've

built.

Opkey is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive

environment for all employees.

Skills: python,sql,etl,apache spark,postgresql