
Search by job, company or skills
Job Title: Data Architect
Position Summary
We are seeking a high-impact Data Architect to own the end-to-end design, execution, and strategic evolution of our multi-cloud data ecosystem. This is a leadership role requiring deep technical polyglot-ism across data engineering, cloud architecture, and software development, combined with the strategic vision and people management skills to lead a high-performing data engineering team.
You will be the primary technical authority for all data-at-rest and data-in-motion, responsible for designing scalable, resilient, and high-concurrency data models, storage solutions, and processing pipelines. The ideal candidate is a hands-on-keyboard architect who can write production-level Python code, optimize complex SQL, deploy infrastructure via Terraform, and mentor junior engineers, all while defining the long-term data roadmap to support our business-critical analytics, data science, and ML initiatives.
Core Technical Responsibilities
1. Data Architecture & Strategy:
Design & Blueprinting: Architect and document the canonical enterprise data model, data flow diagrams (DFDs), and architectural blueprints for our data platform.
Technology & Tool Selection: Lead the evaluation, PoC (Proof of Concept), and selection of all data platform technologies, balancing build-vs-buy decisions for ingestion, storage, processing, and governance.
Multi-Cloud Strategy: Design and implement a cohesive, abstracted data architecture that federates data and workloads across AWS, Azure, and GCP. Implement patterns for inter-cloud data movement, cost optimization, and security parity.
Modern Paradigms: Champion and implement modern data architecture patterns, including Data Mesh, Data Fabric, and Lakehouse (e.g., Databricks/Delta Lake), moving beyond traditional monolithic warehousing.
2. Data Engineering & Pipeline Orchestration:
ETL/ELT Frameworks: Engineer and optimize high-throughput, fault-tolerant data ingestion and transformation pipelines. Must be an expert in both batch and near-real-time streaming (e.g., Kafka, Kinesis, Pub/Sub) architectures.
Modern ELT Stack: Demonstrate mastery of the modern data stack, including data transformation (e.g., dbt), ingestion (e.g., Fivetran, Airbyte), and orchestration (e.g., Airflow, Dagster, Prefect).
SQL & Database Design: Possess expert-level SQL skills, including query optimization, analytical functions, CTEs, and procedural SQL. Design and implement DDL for data warehouses (e.g., Snowflake, BigQuery, Redshift) and OLTP systems, ensuring normalization/denormalization is optimized for use case.
3. Programming & Infrastructure:
Python Expertise: Utilize Python as a first-class language for data engineering. This includes writing custom ETL scripts, building data-centric microservices/APIs (e.g., using FastAPI), leveraging PySpark for distributed processing, and scripting for automation.
Infrastructure as Code (IaC): Own the data platform's infrastructure definitions using Terraform or CloudFormation. Implement and enforce CI/CD best practices (e.g., GitHub Actions, Jenkins) for all data pipeline and infrastructure code.
Containerization: Leverage Docker and Kubernetes (EKS, GKE, AKS) for deploying and scaling data services and applications.
4. Leadership & People Management:
Team Leadership: Lead and mentor a team of data engineers, data modelers, and BI developers. Manage team velocity, sprint planning (Agile/Scrum), and performance reviews.
Code Quality & Best Practices: Enforce software engineering best practices within the data team, including rigorous code reviews, version control (Git), unit/integration testing, and comprehensive documentation.
Stakeholder Management: Act as the primary technical liaison to cross-functional leaders (Product, Engineering, Data Science). Translate complex business requirements into technical specifications and data models.
Required Qualifications & Technical Stack
Experience: 10+ years in data engineering/architecture, with at least 3+ years in a formal leadership or people management role.
Python: Demonstrable, expert-level proficiency in Python for data manipulation (Pandas, Polars), distributed computing (PySpark, Dask), and API development.
SQL: Mastery of advanced SQL, DDL, DML, and query performance tuning on one or more major analytical databases (Snowflake, BigQuery, Redshift, Databricks SQL).
Cloud: 5+ years of hands-on experience designing and building data solutions on at least two of the major cloud providers (AWS, GCP, Azure). Must understand the native services (e.g., S3/ADLS/GCS, Redshift/BigQuery/Synapse, Glue/Data Factory, Kinesis/Event Hubs).
ETL/ELT Tools: Deep experience with modern data stack tooling. Must have hands-on experience with:
Orchestration: Airflow, Dagster, or Prefect.
Transformation: dbt (highly preferred).
Data Modeling: Expert in dimensional modeling (Kimball) and 3NF, with proven experience designing data models for large-scale data warehouses and data marts.
Leadership: Proven ability to build, manage, and motivate a technical team. Must be ableto articulate a strategic technical vision and execute it.
Preferred Qualifications
Certifications: Professional-level cloud architect certifications (e.g., AWS Certified Solutions Architect Professional, Google Cloud Professional Data Engineer).
Streaming: Hands-on experience with Apache Kafka, Spark Structured Streaming, or Flink.
Data Governance: Experience implementing data governance and cataloging tools (e.g., Collibra, Alation, Amundsen).
MLOps: Familiarity with MLOps pipelines and infrastructure to support data science model training and deployment.
Job ID: 131871845