Search by job, company or skills

GMG

Data Architect

new job description bg glownew job description bg glownew job description bg svg
  • Posted 10 days ago
  • Be among the first 10 applicants
Early Applicant

Job Description

What we do:

GMG is a global well-being company retailing, distributing and manufacturing a portfolio of leading international and home-grown brands across sport, everyday goods, health and beauty, properties and logistics sectors. Under the ownership and management of the Baker family for over 45 years, GMG is a valued partner of choice for the world's most successful and respected brands in the well-being sector. Working across the Middle East, North Africa, and Asia, GMG has introduced more than 120 brands across 12 countries. These include notable home-grown brands such as Sun & Sand Sports, Dropkick, Supercare Pharmacy, Farm Fresh, Klassic, and international brands like Nike, Columbia, Converse, Timberland, Vans, Mama Sita's, and McCain.

What will you do:

We are hiring a Data Architect to own the end-to-end architecture and engineering standards of our data and AI platform. This is a hands-on individual contributor role with leadership responsibility for 2 engineers. You will design, implement, and operate scalable, secure, and cost-effective data infrastructure across Databricks on AWS, enabling analytics/BI, classical ML, and GenAI/Agentic AI workloads

Role Summary:

- Own the data platform architecture (ingestion lake/warehouse serving) and its operating model.

- Lead implementation of infrastructure, orchestration, CI/CD, observability, quality, lineage, and governance.

- Architect and enable BI, MLOps, and Agentic AI platform capabilities.

- Evaluate and introduce fit-for-purpose tools (open-source preferred) to solve team challenges.

- Set engineering best practices and manage delivery through a small team.

Responsibilities:

Data platform & infrastructure ownership:

- Own platform architecture on AWS + Databricks, ensuring scalability, security, reliability, and cost efficiency.

- Define the target architecture across batch pipelines, streaming patterns, storage formats, and compute policies.

- Implement infrastructure-as-code using Terraform, including environments, networking dependencies (as needed), and platform configuration.

Architecture for BI, ML, and Agentic AI:

- Design architecture patterns for:

- BI data serving and exports to downstream BI stacks (e.g., Fabric) through governed, performant datasets.

- MLOps foundations: training/inference patterns (batch-first), model registry/versioning approach, monitoring integration.

- Agentic AI infrastructure: secure retrieval patterns, tool access boundaries, prompt/tool governance, and audit logs (platform-level enablers, not use-case specifics).

- Ensure architectural decisions support both experimentation and production-grade operation.

Data engineering best practices & SDLC:

- Establish engineering standards: branching strategy, PR reviews, release/versioning, code quality gates, and documentation.

- Implement CI/CD for data pipelines and infrastructure; enforce Git-based workflows and environment promotion.

- Promote modular, reusable pipeline patterns and templates for the team.

Data quality, lineage, and governance:

- Implement quality frameworks: freshness/completeness/validity checks, anomaly detection on key measures.

- Establish lineage and metadata management; define how datasets are documented and discoverable.

- Own data classification (PII/sensitive), retention policies, and secure access patterns (RBAC/ABAC).

Tooling strategy (open-source preferred):

- Evaluate and introduce fit-for-purpose tools in areas like:

- Observability/monitoring

- Data quality and testing

- Lineage/catalog

- Orchestration enhancements

- Secrets management and policy enforcement

- Make pragmatic build-vs-buy decisions with clear TCO and operational fit.

Data modeling (added advantage):

- Guide and review modeling patterns (dimensional/entity models) to ensure consistent, reusable datasets for reporting, analytics and ML.

How does success look like:

- A stable, scalable platform with clear architectural standards and high engineering quality.

- Pipelines are reliable with defined SLAs/SLOs, strong observability, and reduced incident frequency.

- CI/CD and Git-based SDLC are adopted; changes are predictable, versioned, and easy to roll back.

- BI/ML/GenAI platform foundations are in place and are enabling faster delivery across teams.

- Measurable cost/performance improvements (job runtimes, compute spend, data freshness reliability).

- The 2 engineers operate with clarity, quality, and autonomy under your guidance.

Technical Competencies:

- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery.

- Proven ownership of end-to-end data platforms (lake/warehouse + orchestration + governance).

- Experience leading small teams and driving engineering standards and change management.

- Strong stakeholder management and ability to balance speed, quality, and control.

Required technical skills:

Mandatory:

- Databricks on AWS platform understanding (workloads, jobs, cluster policies, Delta/Lakehouse concepts).

- Strong Terraform (IaC) for cloud/platform infrastructure.

- Containerization & runtime: Docker, Kubernetes (deployment patterns, environment management).

- Orchestration: Airflow (DAG design, retries, backfills, SLAs).

- Data transformation practices (dbt familiarity preferred; tool-agnostic standards accepted).

- CI/CD implementation, Git workflows, branching/release strategy.

- Strong understanding of data platform concerns: ingestion, streaming concepts, outbound patterns, quality, lineage, retention, and classification.

- Security fundamentals: IAM/RBAC, secrets management, auditability, PII handling.

Good to have:

- Deep dbt experience (macros, tests, docs, environment promotion).

- Lakeflow jobs experience / Databricks Workflows depth.

- Experience with open-source tools in:

- Data quality (e.g., Great Expectations / Soda)

- Lineage/catalog (e.g., OpenLineage / DataHub / Amundsen)

- Observability (e.g., Prometheus/Grafana stack)

- Strong data modeling background (dimensional + metrics layer thinking).

- Experience with ML platform patterns and LLM/RAG platform guardrails.

Qualification & Experience:

- Graduation or Masters in Statistics, Mathematics, Computer Science or equivalent

- 10+ years in data engineering / data platform / data architecture roles with hands-on delivery

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 142479287

Similar Jobs