About AutoZone
AutoZone is the nation's leading retailer and a leading distributor of automotive replacement parts and accessories with more than 6,000 stores in US, Puerto Rico, Mexico, and Brazil. Each store carries an extensive line for cars, sport utility vehicles, vans and light trucks, including new and remanufactured hard parts, maintenance items and accessories.
We also sell automotive diagnostic and repair software through ALLDATA, diagnostic and repair information through ALLDATAdiy.com, automotive accessories through AutoAnything.com and auto and light truck parts and accessories through AutoZone.com.
Since opening its first store in Forrest City, Ark. on July 4, 1979, the company has joined the New York Stock Exchange (NYSE: AZO) and earned a spot in the Fortune 500.
AutoZone has been committed to providing the best parts, prices, and customer service in the automotive aftermarket industry. We have a rich culture and history of going the Extra Mile for our customers and our community. At AutoZone you're not just doing a job; you're playing a crucial role in creating a better experience for our customers, while creating opportunities to DRIVE YOUR CAREER almost anywhere! We are looking for talented people who are customer focused, enjoy helping others and have the DRIVE to excel in a fast-paced environment!
Position Summary
The Senior Data Engineer is a technical expert responsible for designing, implementing, and supporting scalable data solutions using modern big data technologies. In this full-time role, you will work closely with data architects, analysts, and business stakeholders to build, optimize, and maintain data pipelines and platforms that drive analytics, reporting, and machine learning. You will ensure the reliability, performance, and security of production data systems, and play a key role in troubleshooting, monitoring, and continuous improvement.
- Design, develop, and maintain robust data pipelines for ingesting, transforming, and storing large volumes of structured, semi-structured, and unstructured data.
- Implement data workflows and ETL processes using technologies such as Spark, Delta Lake, and cloud-native tools.
- Support the production use of big data platforms (e.g., Databricks, Snowflake, Google Cloud Platform), ensuring high availability, scalability, and performance.
- Monitor, troubleshoot, and resolve issues in production data systems, including job failures, performance bottlenecks, and data quality concerns.
- Collaborate with data architects to translate business requirements into technical solutions, ensuring alignment with architectural standards and best practices.
- Optimize SQL queries, Spark jobs, and data processing workloads for efficiency and cost-effectiveness.
- Implement and maintain data governance, security, and compliance measures, including access controls, data masking, and audit logging.
- Integrate data workflows into CI/CD pipelines, automate deployment processes, and manage source control using Git and related DevOps tools.
- Document data pipelines, workflows, and operational procedures to support knowledge sharing and maintainability.
- Stay current with emerging big data technologies and recommend improvements to enhance reliability, scalability, and efficiency.
- 7+ years of experience in data engineering or related roles, with hands-on experience building and supporting production data pipelines.
- Strong proficiency with big data platforms (e.g., Databricks, Snowflake, Google Cloud Platform) and the Spark ecosystem.
- Experience with data lakehouse and warehouse architectures, including Delta Lake or similar technologies.
- Senior Data Engineer with deep expertise in Retrieval-Augmented Generation (RAG) systems and advanced chunking strategies for optimizing semantic search and LLM integration, delivering scalable solutions for enterprise knowledge retrieval and contextual AI applications.
- Advanced skills in PySpark, Python, SQL, and related data processing frameworks.
- Experience with cloud storage, messaging/streaming technologies (e.g., Apache Kafka, cloud Pub/Sub), and vector databases.
- Proven ability to optimize ETL jobs, tune cluster configurations, and troubleshoot production issues.
- Familiarity with data governance tools, catalog solutions, and security best practices.
- Experience integrating data workflows into CI/CD pipelines and using DevOps tools (e.g., Git, Jenkins, Terraform).
- Strong problem-solving skills, attention to detail, and ability to work independently or as part of a team.
- Excellent communication skills to collaborate with cross-functional teams and document technical solutions.
- Experience working in an Agile environment.