Data Architect

GMG

Gurugram, Gurugram, India

10-12 Years

Save

Posted 10 days ago
Be among the first 10 applicants

Early Applicant

Job Description

What we do:

GMG is a global well-being company retailing, distributing and manufacturing a portfolio of leading international and home-grown brands across sport, everyday goods, health and beauty, properties and logistics sectors. Under the ownership and management of the Baker family for over 45 years, GMG is a valued partner of choice for the world's most successful and respected brands in the well-being sector. Working across the Middle East, North Africa, and Asia, GMG has introduced more than 120 brands across 12 countries. These include notable home-grown brands such as Sun & Sand Sports, Dropkick, Supercare Pharmacy, Farm Fresh, Klassic, and international brands like Nike, Columbia, Converse, Timberland, Vans, Mama Sita's, and McCain.

What will you do:

We are hiring a Data Architect to own the end-to-end architecture and engineering standards of our data and AI platform. This is a hands-on individual contributor role with leadership responsibility for 2 engineers. You will design, implement, and operate scalable, secure, and cost-effective data infrastructure across Databricks on AWS, enabling analytics/BI, classical ML, and GenAI/Agentic AI workloads

Role Summary:

- Own the data platform architecture (ingestion lake/warehouse serving) and its operating model.

- Lead implementation of infrastructure, orchestration, CI/CD, observability, quality, lineage, and governance.

- Architect and enable BI, MLOps, and Agentic AI platform capabilities.

- Evaluate and introduce fit-for-purpose tools (open-source preferred) to solve team challenges.

- Set engineering best practices and manage delivery through a small team.

Responsibilities:

Data platform & infrastructure ownership:

- Own platform architecture on AWS + Databricks, ensuring scalability, security, reliability, and cost efficiency.

- Define the target architecture across batch pipelines, streaming patterns, storage formats, and compute policies.

- Implement infrastructure-as-code using Terraform, including environments, networking dependencies (as needed), and platform configuration.

Architecture for BI, ML, and Agentic AI:

- Design architecture patterns for:

- BI data serving and exports to downstream BI stacks (e.g., Fabric) through governed, performant datasets.

- MLOps foundations: training/inference patterns (batch-first), model registry/versioning approach, monitoring integration.

- Agentic AI infrastructure: secure retrieval patterns, tool access boundaries, prompt/tool governance, and audit logs (platform-level enablers, not use-case specifics).

- Ensure architectural decisions support both experimentation and production-grade operation.

Data engineering best practices & SDLC:

- Establish engineering standards: branching strategy, PR reviews, release/versioning, code quality gates, and documentation.

- Implement CI/CD for data pipelines and infrastructure; enforce Git-based workflows and environment promotion.

- Promote modular, reusable pipeline patterns and templates for the team.

Data quality, lineage, and governance:

- Implement quality frameworks: freshness/completeness/validity checks, anomaly detection on key measures.

- Establish lineage and metadata management; define how datasets are documented and discoverable.

- Own data classification (PII/sensitive), retention policies, and secure access patterns (RBAC/ABAC).

Tooling strategy (open-source preferred):

- Evaluate and introduce fit-for-purpose tools in areas like:

- Observability/monitoring

- Data quality and testing

- Lineage/catalog

- Orchestration enhancements

- Secrets management and policy enforcement

- Make pragmatic build-vs-buy decisions with clear TCO and operational fit.

Data modeling (added advantage):

- Guide and review modeling patterns (dimensional/entity models) to ensure consistent, reusable datasets for reporting, analytics and ML.

How does success look like:

- A stable, scalable platform with clear architectural standards and high engineering quality.

- Pipelines are reliable with defined SLAs/SLOs, strong observability, and reduced incident frequency.

- CI/CD and Git-based SDLC are adopted; changes are predictable, versioned, and easy to roll back.

- BI/ML/GenAI platform foundations are in place and are enabling faster delivery across teams.

- Measurable cost/performance improvements (job runtimes, compute spend, data freshness reliability).

- The 2 engineers operate with clarity, quality, and autonomy under your guidance.

Technical Competencies:

- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery.

- Proven ownership of end-to-end data platforms (lake/warehouse + orchestration + governance).

- Experience leading small teams and driving engineering standards and change management.

- Strong stakeholder management and ability to balance speed, quality, and control.

Required technical skills:

Mandatory:

- Databricks on AWS platform understanding (workloads, jobs, cluster policies, Delta/Lakehouse concepts).

- Strong Terraform (IaC) for cloud/platform infrastructure.

- Containerization & runtime: Docker, Kubernetes (deployment patterns, environment management).

- Orchestration: Airflow (DAG design, retries, backfills, SLAs).

- Data transformation practices (dbt familiarity preferred; tool-agnostic standards accepted).

- CI/CD implementation, Git workflows, branching/release strategy.

- Strong understanding of data platform concerns: ingestion, streaming concepts, outbound patterns, quality, lineage, retention, and classification.

- Security fundamentals: IAM/RBAC, secrets management, auditability, PII handling.

Good to have:

- Deep dbt experience (macros, tests, docs, environment promotion).

- Lakeflow jobs experience / Databricks Workflows depth.

- Experience with open-source tools in:

- Data quality (e.g., Great Expectations / Soda)

- Lineage/catalog (e.g., OpenLineage / DataHub / Amundsen)

- Observability (e.g., Prometheus/Grafana stack)

- Strong data modeling background (dimensional + metrics layer thinking).

- Experience with ML platform patterns and LLM/RAG platform guardrails.

Qualification & Experience:

- Graduation or Masters in Statistics, Mathematics, Computer Science or equivalent

- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery

More Info

Job Type:

Industry:

Function:

Employment Type:

About Company

GMGJob Source: www.linkedin.com

Job ID: 142479287

Jobs by Skill - IT

Jobs by Skill - Non IT

International Jobs

Last Updated: 24-02-2026 11:41:39 PM

Homejobs in GurugramData Architect

Similar Jobs

Principal Data Architect

Wingify

12-14 yrs

Gurugram, Gurugram, India

Data Architect

USEReady

10-12 yrs

Gurugram, India

Auxo AI - Dremio Data Architect - SQL/Kafka

AuxoAI

10-12 yrs

Gurugram, Gurugram, India

VP - Data Architect (B2B SaaS)

Zepcruit

10-12 yrs

Delhi, India

Data Architect

Iris Software Inc.

12-14 yrs

Noida, India

Do you want to see more relevant and perfect job for you?

Beware of Scammers

We don’t charge any money for job offers

What it feels like to have

48% more interview calls?

To get 5X more recruiter views on your profile