Senior Data Analytics Engineer

EXL

Gurugram, Gurugram, India

Fresher

Save

Posted 4 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Role

We are looking for a curious and driven Data Engineer to join our team. In this role, you will architect and build scalable data lakehouse solutions across major cloud providers. You will move beyond simple script writing to engineer robust, production-grade pipelines using the Modern Data Stack.

You will act as both an individual contributor and a technical mentor, ensuring high code quality and leveraging AI-assisted development tools (like Cursor, Claude, or GitHub Copilot) to maximize efficiency and innovation.

Key Responsibilities

Big Data Engineering: Design, build, and maintain scalable ETL/ELT pipelines using PySpark and Advanced SQL to process massive datasets.
Platform Architecture: Architect and implement infrastructure on Databricks or Snowflake, leveraging cloud storage (S3/ADLS), serverless services, and modern data warehouses.
Transformation & Orchestration: Utilize DBT (Data Build Tool) for effective data transformation and manage job orchestration/scheduling.
Code Quality & Best Practices: Champion software engineering best practices, including version control (Git), writing comprehensive unit tests, and maintaining design/API documentation.
AI-Augmented Development: Actively utilize AI coding assistants (Cursor, Copilot, etc.) to accelerate development cycles and improve code efficiency.
Collaboration & Review: Conduct rigorous peer code reviews for PySpark logic, ETL analytics, and Machine Learning integration.
Mentorship: Lead and mentor junior developers, ensuring the team adheres to best coding practices and helping them grow into future leaders.

Qualifications

Must-Have (Core Competencies):

Expert PySpark Proficiency: Deep experience processing large-scale data using Spark/PySpark. You understand how distributed computing works under the hood.
Advanced SQL: You can write complex, performant queries and understand database optimization deeply.
Python Scripting: Strong ability to write clean, modular, and efficient Python code for data engineering pipelines.
Platform Experience: Proven track record working within Databricks or Snowflake environments.
Data Modeling: Strong understanding of database systems, data modeling (Star schema, Snowflake schema), and data architecture.
Engineering Mindset: Experience with CI/CD, unit testing, and integrating with existing codebases.
AI Adaptability:Proficiency/Comfort with AI-enabled software development. You should be comfortable using IDEs with GenAI tools (Cursor, VS Code with Copilot, etc.) to iterate faster.

Good-to-Have (Preferred Qualifications):

Experience with DLT Hub and orchestration platforms like Airflow/Prefect.
Experience with Modern Data Stack (Fivetran, Airbyte, DBT, etc.).
Familiarity with DuckDB for analytical processing.
Exposure to building applications using LLMs/GenAI (OpenAI SDK, Gemini, Anthropic)