Opkey | Series B Funded | Noida, India (In-Office) | Full-Time
The Opportunity
Opkey, a Series B funded enterprise application lifecycle management platform, is looking
for a Senior Data Engineer to join our team in Noida. We need someone who can build and
scale the data infrastructurepipelines, storage systems, and processing enginesthat
powers our platform.
We're not pitching a visionwe're scaling a reality. Our systems already process hundreds
of gigabytes of enterprise data. Now we need an engineer who can make that infrastructure
handle 10x more, 10x faster, with bulletproof reliability. This is your chance to be part of
building something that will define a category.
About Us
Opkey is redefining how enterprises manage the lifecycle of their most critical applications.
We've built the platform that takes organizations from Design to Configure to Test to Train,
powered by agentic AI.
Our customers already include Fortune 500 companies and top global system
integrators. They trust us with hundreds of gigabytes of their most sensitive enterprise
datapayroll files, configuration exports, test resultsbecause we've proven we can handle
it.
We're already doing what others are only talking about. Our pipelines already process
massive payroll files in real-time. Our systems already normalize chaotic enterprise data
formats into clean, queryable structures. Our infrastructure already powers AI and analytics
that enterprises depend on.
Now we're scaling. And we need exceptional people to help us go from category creator to
category leader.
This is founder mode, not corporate mode. We move fast, we solve hard problems, and
we ship things that matter.
Why This Role Matters
Data scientists can't build models on broken pipelines. Analysts can't find insights in dirty
data. The entire intelligence layer of our platform depends on rock-solid data infrastructure.
You'll build the foundation everything else depends on.
You'll design the pipelines that ingest data from dozens of enterprise formats. You'll build the
systems that diff millions of records in seconds. You'll create the infrastructure that lets our
data scientists focus on algorithms instead of wrestling with data quality.
When a Fortune 500 company validates their payroll migration, your infrastructure makes
that possible. When our ML models predict configuration failures, they're running on
pipelines you built.
This is already happening at Opkey. You'll help us scale it to the world.
What You'll Do
You'll join a team that's already built production data infrastructure handling enterprise-scale
workloads. Your job is to make it faster, more reliable, and ready for the next order of
Magnitude
- Build & Optimize Data Pipelines: Design and implement ETL/ELT pipelines that
ingest data from diverse enterprise sourcesExcel files, CSVs, API exports,
database extracts, proprietary formatsand transform it into clean, queryable
structures.
- Design High-Performance Comparison Engines: Build systems that diff massive
datasetspayroll files with millions of records, configuration exports with thousands
of parametersand surface differences in real-time.
- Architect Scalable Data Storage: Design and manage data warehouses, data
lakes, and databases that handle terabytes of enterprise data. Make decisions about
partitioning, indexing, and storage formats.
- Ensure Data Quality & Reliability: Implement validation, monitoring, and alerting
systems that catch data issues before they affect downstream consumers. Build self
healing, observable pipelines.
- Enable Analytics & ML Teams: Partner with data scientists to build the
infrastructure they needfeature stores, training data pipelines, model serving
infrastructure.
- Scale for Growth: Design systems that can handle 10x the data without 10x the cost
or complexity. Think ahead about bottlenecks and architect around them.
Skills & Qualifications
Required Technical Skills
- Python for Data Engineering: 4+ years of production experience writing clean,
maintainable, performant Python code for data processing and pipeline development
- SQL Mastery: Expert-level SQLcomplex queries, query optimization,
understanding execution plans. You can look at a slow query and know how to fix it.
- Data Pipeline Development: Hands-on experience building ETL/ELT pipelines that
run reliably in production. You've designed pipelines that process millions of records
without failing.
- Distributed Computing: Deep knowledge of frameworks like Apache Spark for
large-scale data processing. You understand partitioning strategies, shuffle
optimization, and memory management.
- Data Modeling & Warehousing: Strong foundation in data modelingstar
schemas, slowly changing dimensions, normalization vs. denormalization tradeoffs.
- Database Technologies: Experience with relational databases (PostgreSQL,
MySQL) and data warehouses (Redshift, Snowflake, BigQuery). You know when to
use each.
Nice to Have
- Experience with streaming data systems (Kafka, Kinesis)
- Cloud platform expertise (AWS, GCP, Azure)
- Knowledge of orchestration tools (Airflow, Dagster, Prefect)
- Background in data comparison/diffing algorithms
- Experience with containerization (Docker, Kubernetes)
- Exposure to enterprise data formats and systems
Mindset & Approach
- Reliability-Obsessed: You've been paged at 2am, and you've built systems that
don't page you at 2am. You understand what it takes to run production infrastructure.
- Systems Thinker: You see how individual components fit into the larger
architecture. You make tradeoffs that optimize for the whole system.
- Ownership Mentality: You don't treat data quality as someone else's problem. You
own the pipeline end-to-endfrom ingestion to the data scientist's query.
- Pragmatic Engineer: You know when to build for flexibility and when to optimize for
performance. You don't chase shiny tools when proven ones work better.
- Founder Mentality: You thrive in ambiguity, make architectural decisions with
incomplete information, and care about outcomes over perfect documentation.
What We're NOT Looking For
- Engineers who only want to work with cutting-edge tools regardless of fit
- People who treat data quality as someone else's problem
- Those who need a detailed roadmap handed to them
- Candidates who've never owned production systems end-to-end
What We Offer
- Competitive salary + meaningful equity in a company that's already winning
- The chance to architect data infrastructure that Fortune 500 companies depend on
- A team that values speed, ownership, and results over politics
- Direct impactyour pipelines will process enterprise data at a scale most engineers
never see
- The opportunity to be part of historybuilding the data foundation that powers
how enterprises manage their most critical applications
We've proven our infrastructure works. Now we need someone to scale it to
the world.
Apply with your resume and a brief note about the most challenging data pipeline you've
built.
Opkey is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive
environment for all employees.
Skills: python,sql,etl,apache spark,postgresql