DATA Architect - Lead Systems Engineer

AutoZone

Delhi, India

10-12 Years

Save

Posted 4 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About AutoZone

AutoZone is the nation's leading retailer and a leading distributor of automotive replacement parts and accessories with more than 6,000 stores in US, Puerto Rico, Mexico, and Brazil. Each store carries an extensive line for cars, sport utility vehicles, vans and light trucks, including new and remanufactured hard parts, maintenance items and accessories. We also sell automotive diagnostic and repair software through ALLDATA, diagnostic and repair information through ALLDATAdiy.com, automotive accessories through AutoAnything.com and auto and light truck parts and accessories through AutoZone.com. Since opening its first store in Forrest City, Ark. on July 4, 1979, the company has joined the New York Stock Exchange (NYSE: AZO) and earned a spot in the Fortune 500. AutoZone has been committed to providing the best parts, prices, and customer service in the automotive aftermarket industry. We have a rich culture and history of going the Extra Mile for our customers and our community. At AutoZone you're not just doing a job; you're playing a crucial role in creating a better experience for our customers, while creating opportunities to DRIVE YOUR CAREER almost anywhere! We are looking for talented people who are customer focused, enjoy helping others and have the DRIVE to excel in a fast-paced environment!

Position Summary

The Senior Data Architect is a senior technical leader responsible for building and optimizing a robust data platform in the automotive industry. In this full-time role, you will lead a team of data engineers and own the end-to-end architecture and implementation of the GCP Data Lake house platform. You will collaborate closely with function leaders, domain analysts and other stakeholders to design scalable data solutions that drive business insights. This position demands deep expertise in GCP Data Lake, and ability to build end-to-end data pipelines that handle large volumes of structured, semi structured and unstructured data. You will demonstrate strong leadership to ensure best practices in data engineering, performance tuning, and governance. You will be expected to communicate complex technical concepts and data strategies to technical and non-technical audiences including executive leadership.

Lead, mentor, and manage a team of data engineers, providing technical guidance, code reviews, and foster a high-performing team.
Own the GCP Data Lake architecture and implementation, ensuring the environment is secure, scalable, and optimized for the organization's data processing needs. Design and oversee the Lakehouse architecture leveraging Delta Lake and Apache Spark.
Implement and manage GCP Data Lake Unity Catalog for unified data governance. Ensure fine-grained access controls and data lineage tracking are in place to secure sensitive data.
Collaborate with analytics teams to develop and optimize GCP Data Lake SQL queries and dashboards. Tune SQL workloads and caching strategies for faster performance and ensure efficient use of the query engine.
Lead performance tuning initiatives. Profile data processing code to identify bottlenecks and refactor for improved throughput and lower latency. Implement best practices for incremental data processing with Delta Lake, and ensure compute cost efficiency (e.g., by optimizing cluster utilization and job scheduling).
Work closely with domain analysts, data scientists and product owners to understand requirements and translate them into robust data pipelines and solutions. Ensure that data architectures support analytics, reporting, and machine learning use cases effectively.
Integrate GCP Data Lake workflows into the CI/CD pipeline using DevOps principles and Git. Develop automated deployment processes for notebooks and jobs to promote consistent releases. Manage source control for GCP Data Lake code (using GitLab) and collaborate with DevOps engineers to implement continuous integration and delivery for data projects.
Collaborate with security and compliance teams to uphold data governance standards. Implement data masking, encryption, and audit logging as needed, leveraging Unity Catalog and GCP security features to protect sensitive data.
Stay up to date with the latest GCP Data Lake features and industry's best practices. Proactively recommend and implement improvements (such as new performance optimization techniques or cost-saving configurations) to continuously enhance the platform's reliability and efficiency.
10+ years of experience in data engineering, data architecture, or related roles, with a track record of designing and deploying data pipelines and platforms at scale.
Significant hands-on experience with GCP Data Lake and the Apache Spark ecosystem. Proficient in building data pipelines using PySpark/Scala and managing data in Delta Lake format.
Strong experience working with cloud data platforms (GCP preferred, or AWS/Azure). Familiarity with GCP Storage principles.
Strong skills in vector databases and embedding models to support scalable RAG systems. Proficient in optimizing retrieval and indexing for LLM integration.
Strong experience in managing structured, semi structured and unstructured data in GCP Data Lake.
Ability to inspect existing data pipelines, discern their purpose and functionality, and re-implement them efficiently in GCP Data Lake.
Advanced SQL skills with the ability to write and optimize complex queries. Solid understanding of data warehousing concepts and performance tuning for SQL engines.
Proven ability to optimize ETL jobs for performance and cost efficiency. Experience tuning cluster configurations, parallelism, and caching to improve job runtimes and resource utilization.
Demonstrated experience implementing data security and governance measures. Comfortable configuring Unity Catalog or similar data catalog tools to manage schemas, tables, and fine-grained access controls. Able to ensure compliance with data security standards and manage user/group access to data assets.
Experience leading and mentoring engineering teams. Excellent project leadership abilities to coordinate multiple projects and priorities. Strong communication skills to effectively collaborate with cross-functional teams and present architectural plans or results to stakeholders.
Experience working in an Agile environment.

Tools & Technologies

GCP Data Lake Lakehouse Platform: GCP Data Lake Workspace, Apache Spark, Delta Lake, GCP Data Lake SQL, MLflow (for model tracking), Postgres Database.
Data Governance: GCP Data Lake Unity Catalog for data catalog and access control.
Programming & Data Processing: PySpark and Python for building data pipelines and Spark Jobs; SQL for querying .
Cloud Services: GCP Cloud Storage, GCP Pub/Sub technologies and Vector Databases.
DevOps & CI/CD: Git for version control (GitLab), Jenkins and experience with Terraform for infrastructure-as-code is a plus.
Other Tools: Project and workflow management tools JIRA and confluence. Looker Studio and PowerBI

Preferred Certificates, Licenses, and Registrations

GCP Data Lake Certified Data Engineer Professional or GCP Data Lake Certified Data Engineer Associate.
Exposure to related big data and streaming tools such as Apache Kafka, GCP Pub/Sub services, Apache Airflow and BI/analytics tools (e.g., Power BI, Looker Studio) is advantageous.