The Role
We are seeking a Data Platform Architect with an AI-first mindset to design and lead the implementation of a modern, enterprise-grade data architecture. You will be responsible for building the technical infrastructurespanning data lakes, feature stores, and real-time pipelinesthat enables our data scientists and AI engineers to move from experimentation to high-impact production environments.
Responsibilities:
- Architectural Blueprinting:
- Design scalable and secure data platform blueprints (e.g., Lakehouse, Data Mesh, or Data Fabric) that support diverse AI workloads, including generative AI and classical machine learning.
- Building scalable, cloud-native storage and processing frameworks (data lakes, lakehouses) capable of handling massive datasets for model training
- AI Data Infrastructure Design: Develop specific architectures for AI-driven workflows, including feature stores, real-time data streaming (Kafka/Spark), and automated machine learning pipelines.
- Data Lifecycle Management:
- Oversee the end-to-end data lifecycle, from high-fidelity data acquisition and cleaning to preprocessing and model serving.
- Data Pipeline Automation:Creating end-to-end automated pipelines for data ingestion, cleaning, and feature engineering to reduce the time from data raw state to ML model input.
- Architecting systems that support streaming data (e.g., Kafka, Kinesis) for low-latency inference in applications like IoT, fraud detection, and customer experience
- Implementing strict governance, including metadata management, data lineage (tracking data origin), and quality monitoring to ensure clean data, preventing model failure.
- Governance & Ethics: Establish unified data governance frameworks that ensure security, privacy (GDPR/CCPA), and compliance while mitigating algorithmic bias.
- Stakeholder Collaboration: Act as the technical bridge between business leadership, data science teams, and IT infrastructure to align technology with strategic AI objectives.
- Security & Compliance: Embedding zero-trust principles, role-based access control (RBAC), and regulatory compliance (GDPR, HIPAA) directly into the data architecture.
- MLOps Collaboration:Working closely with data scientists and MLOps teams to integrate feature stores, model registries, and monitoring tools for continuous retraining
Qualifications & Experience
- Bachelor's or Master's degree in Computer Science, Information Systems, Engineering, or a related field.
- 1016 years of experience in data warehouse /Bigdata Data platform skills, with at least 3-5 years focused on AI/ML supporting infrastructure.
- Deep expertise in cloud platforms like AWS, Azure, or Google Cloud, and big data technologies such as Apache Spark, ADF, Databricks, and Snowflake.
- Experience with data governance, security, and compliance standards.
- Excellent communication and stakeholder management skills.