
Search by job, company or skills
Company: Kuinbee
Location: Pune, Maharashtra
Mode: Hybrid
Role Type: Full-Time
About Kuinbee
Kuinbee is building a unified data ecosystem that combines a scalable data marketplace with
an end-to-end AI-driven pipeline. Our platform enables automated ingestion,
transformation, quality checks, lineage tracking, modelling, and metadata intelligence
allowing organisations to integrate, manage, and operationalise their data with minimal
engineering effort. By merging marketplace accessibility with intelligent automation,
Kuinbee aims to redefine how modern data systems are built, governed, and scaled.
Role Overview
The ideal candidate will have deep knowledge of end-to-end data workflows, strong
architectural thinking, and the ability to translate engineering processes into modular,
automated agents. You will work closely with the product and AI teams to formalize the
logic that powers Kuinbee's data automation platform.
Key Responsibilities
Document complete pipeline flows from source to serving, including raw, clean,
transformed, and model-ready stages.
Identify technical pain points in real-world pipelines, including failure modes, schema
drift, refresh inconsistencies, and orchestration issues.
Demonstrate how heterogeneous sources such as databases, APIs, files, and streams are
combined, validated, modelled, and monitored.
Present two to three real pipelines you have built, including architecture diagrams,
decisions, and recovery strategies.
Collaborate with AI engineers to design agent equivalents for schema mapping, data
cleaning, transformations, validation, and lineage. Define metadata requirements for Kuinbee's Supermemory Layer to support governance,
semantic consistency, and automated monitoring.
Core Requirements
5+ years of experience building and maintaining production data pipelines end to end.
Expertise with relational databases such as Postgres, MySQL, or SQL Server.
Experience with data warehouses including BigQuery, Snowflake, or Redshift.
Familiarity with processing files such as Parquet, CSV, and Excel, along with API-based and
streaming data.
Advanced skills in SQL, Python, and modern transformation frameworks such as dbt.
Hands-on experience with Spark, Dask, or other distributed compute engines.
Experience with data quality and observability tools such as Great Expectations, Soda, or
Deequ.
Knowledge of lineage systems such as OpenLineage, DataHub, or OpenMetadata.
Strong data modelling foundation including star schemas, semantic layers, metrics, and
feature preparation.
Experience with orchestration frameworks such as Airflow, Dagster, or Prefect.
Understanding of performance optimization including partitioning, indexing, clustering,
and query planning.
Exposure to integrated machine learning workflows such as feature engineering and
inference paths.
Ability to design, reason about, and evaluate modern data architecture.
Bonus Skills
Compensation: Paid (Contract Based)
How to Apply
Send your CV or portfolio to [Confidential Information].
Applicants who include examples of real pipelines or architecture documents will receive
priority consideration.
Job ID: 133385007