Data Architect

Digital Business People

Gurugram, Gurugram, India

10-12 Years

Save

Posted a month ago
Be among the first 10 applicants

Early Applicant

Job Description

Job Title: Data Architect

Position Summary

We are seeking a high-impact Data Architect to own the end-to-end design, execution, and strategic evolution of our multi-cloud data ecosystem. This is a leadership role requiring deep technical polyglot-ism across data engineering, cloud architecture, and software development, combined with the strategic vision and people management skills to lead a high-performing data engineering team.

You will be the primary technical authority for all data-at-rest and data-in-motion, responsible for designing scalable, resilient, and high-concurrency data models, storage solutions, and processing pipelines. The ideal candidate is a hands-on-keyboard architect who can write production-level Python code, optimize complex SQL, deploy infrastructure via Terraform, and mentor junior engineers, all while defining the long-term data roadmap to support our business-critical analytics, data science, and ML initiatives.

Core Technical Responsibilities

1. Data Architecture & Strategy:

Design & Blueprinting: Architect and document the canonical enterprise data model, data flow diagrams (DFDs), and architectural blueprints for our data platform.

Technology & Tool Selection: Lead the evaluation, PoC (Proof of Concept), and selection of all data platform technologies, balancing build-vs-buy decisions for ingestion, storage, processing, and governance.

Multi-Cloud Strategy: Design and implement a cohesive, abstracted data architecture that federates data and workloads across AWS, Azure, and GCP. Implement patterns for inter-cloud data movement, cost optimization, and security parity.

Modern Paradigms: Champion and implement modern data architecture patterns, including Data Mesh, Data Fabric, and Lakehouse (e.g., Databricks/Delta Lake), moving beyond traditional monolithic warehousing.

2. Data Engineering & Pipeline Orchestration:

ETL/ELT Frameworks: Engineer and optimize high-throughput, fault-tolerant data ingestion and transformation pipelines. Must be an expert in both batch and near-real-time streaming (e.g., Kafka, Kinesis, Pub/Sub) architectures.

Modern ELT Stack: Demonstrate mastery of the modern data stack, including data transformation (e.g., dbt), ingestion (e.g., Fivetran, Airbyte), and orchestration (e.g., Airflow, Dagster, Prefect).

SQL & Database Design: Possess expert-level SQL skills, including query optimization, analytical functions, CTEs, and procedural SQL. Design and implement DDL for data warehouses (e.g., Snowflake, BigQuery, Redshift) and OLTP systems, ensuring normalization/denormalization is optimized for use case.

3. Programming & Infrastructure:

Python Expertise: Utilize Python as a first-class language for data engineering. This includes writing custom ETL scripts, building data-centric microservices/APIs (e.g., using FastAPI), leveraging PySpark for distributed processing, and scripting for automation.

Infrastructure as Code (IaC): Own the data platform's infrastructure definitions using Terraform or CloudFormation. Implement and enforce CI/CD best practices (e.g., GitHub Actions, Jenkins) for all data pipeline and infrastructure code.

Containerization: Leverage Docker and Kubernetes (EKS, GKE, AKS) for deploying and scaling data services and applications.

4. Leadership & People Management:

Team Leadership: Lead and mentor a team of data engineers, data modelers, and BI developers. Manage team velocity, sprint planning (Agile/Scrum), and performance reviews.

Code Quality & Best Practices: Enforce software engineering best practices within the data team, including rigorous code reviews, version control (Git), unit/integration testing, and comprehensive documentation.

Stakeholder Management: Act as the primary technical liaison to cross-functional leaders (Product, Engineering, Data Science). Translate complex business requirements into technical specifications and data models.

Required Qualifications & Technical Stack

Experience: 10+ years in data engineering/architecture, with at least 3+ years in a formal leadership or people management role.

Python: Demonstrable, expert-level proficiency in Python for data manipulation (Pandas, Polars), distributed computing (PySpark, Dask), and API development.

SQL: Mastery of advanced SQL, DDL, DML, and query performance tuning on one or more major analytical databases (Snowflake, BigQuery, Redshift, Databricks SQL).

Cloud: 5+ years of hands-on experience designing and building data solutions on at least two of the major cloud providers (AWS, GCP, Azure). Must understand the native services (e.g., S3/ADLS/GCS, Redshift/BigQuery/Synapse, Glue/Data Factory, Kinesis/Event Hubs).

ETL/ELT Tools: Deep experience with modern data stack tooling. Must have hands-on experience with:

Orchestration: Airflow, Dagster, or Prefect.

Transformation: dbt (highly preferred).

Data Modeling: Expert in dimensional modeling (Kimball) and 3NF, with proven experience designing data models for large-scale data warehouses and data marts.

Leadership: Proven ability to build, manage, and motivate a technical team. Must be ableto articulate a strategic technical vision and execute it.

Preferred Qualifications

Certifications: Professional-level cloud architect certifications (e.g., AWS Certified Solutions Architect Professional, Google Cloud Professional Data Engineer).

Streaming: Hands-on experience with Apache Kafka, Spark Structured Streaming, or Flink.

Data Governance: Experience implementing data governance and cataloging tools (e.g., Collibra, Alation, Amundsen).

MLOps: Familiarity with MLOps pipelines and infrastructure to support data science model training and deployment.