Role Overview
The Data Architect will design and govern scalable big data and analytics platforms on any leading cloud (AWS/Azure/GCP), working closely with sales and delivery teams to shape solutions for prospective and existing clients. The role combines handson architecture, technical leadership, and presales responsibilities, with a strong focus on SQL, Python, and modern data engineering practices.
Key Responsibilities
- Design and own endtoend data architectures including data lakes, data warehouses, and streaming pipelines using big data technologies (e.g., Spark, Kafka, Hive) on public cloud platforms.
- Define canonical data models, integration patterns, and governance standards to ensure data quality, security, and compliance across the organization.
- Lead presales activities: assess client requirements, run discovery workshops, define solution blueprints, size and estimate effort, and contribute to RFP/RFI responses and proposals.
- Build and review PoCs/accelerators using SQL and Python (e.g., PySpark, notebooks) to demonstrate feasibility, performance, and business value to customers.
- Collaborate with data engineers, BI/ML teams, and application architects to ensure the designed architecture is implemented as intended and is costefficient, scalable, and reliable.
- Establish best practices for data security, access control, and lifecycle management in alignment with regulatory and enterprise policies.
- Monitor and continuously optimize data platforms for performance, reliability, and cost, leveraging cloudnative services and observability tools.
- Provide architectural guidance and mentoring to engineering teams; review designs and code for critical data components.
Required Skills & Experience
- 12-16 years of overall experience in data engineering/analytics, with 4+ years as a Data/Big Data Architect.
- Strong expertise in SQL (analytical queries, performance tuning) and Python for data processing and automation.
- Handson experience with big data frameworks and tools such as Spark, Kafka, Hadoop ecosystem, distributed file systems, and modern ETL/ELT pipelines.
- Practical experience on at least one major cloud platform (AWS, Azure, or GCP) with services such as data lakes, warehouse services (Redshift/Snowflake/BigQuery/Synapse), and orchestration tools.
- Proven presales exposure: client workshops, solution design, RFP/RFI responses, effort estimation, and building PoCs or demos.
- Strong understanding of data modeling (OLTP, OLAP, dimensional modeling), data governance, security, and compliance.
- Ability to communicate complex data solutions clearly to both technical and business stakeholders.