Data Engineer

Impetus

Bengaluru, India

7-9 Years

Save

Posted 9 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Role Overview:

Looking for Software Development Engineer II position in Senior Data Engineer role with very good experience in Spark with Scala. Candidate should be able to work independently on end-to-end data engineering and data science use cases, be quick learner with some lead and mentoring experience, should have overall experience of minimum 7-8 years with minimum 5 to 6 Years in Hadoop and Spark OnPrem and Azure Cloud. The ideal candidate will have strong expertise in Scala, Spark and SQL, building scalable and efficient data pipelines on Azure.

Primary Skills: Data Engineering: Basic concepts of Data Engineering, Ingestion from diverse sources and file formats, Hadoop, Data Warehousing, Designing & Implementing large scale Distributed Data platforms & Data Lakes Building distributed platforms or services SQL, Spark, Query Tuning & Performance Optimization Advanced Scala experience (e.g Functional Programming, using Case classes, Complex Data Structures & Algorithms). Having python and pyspark expertise with scala being primary expertise is an added advantage. Experience on SOLID & Dry principles and Good Software Architecture & Design implementation experience Languages: Scala, Python Good experience in Big Data Unit, System, Integration & Regression Testing Build & Devops experience in GitHub Actions/Jenkins, Maven/SBT ,Github, Artifactory/Jfrog, CI/CD Big Data Processing: Hadoop, Sqoop, Spark and Spark Streaming Data Streaming: Experience on Kafka and Spark Streaming Experience in evaluation and implementation of Data Validation & Data Quality Understanding of Data Lake & Medallion Architecture Shell Scripting & Automation using Ansible or related Configuration management tools Agile processes & tools like Jira & Confluence Code Management tools like Git File formats like ORC, Avro, Parquet, Json & CSV Big Data Orchestration: Airflow, Spark on Kubernetes, Yarn, Oozie Data Engineering on Azure Cloud: Proficiency in Azure Data Platform-Data Factory, Databricks, Azure storages, Functions etc., Strong skills in Scala and SQL for data manipulation. Experience in python is an added advantage. Experience with ETL/ELT pipelines and data transformations on cloud Familiarity with Big Data Azure cloud technologies- Azure Databricks, Azure DataFactory, Delta Lake, Cloud Governance, Security. Architect and implement data solutions using the right tool set in Azure services such as Azure Data Factory, Azure Databricks, and Azure Blob Storage, ADLS with right industry standards. Data Optimization & Performance Tuning: Expertise in data pipeline optimization and performance tuning. Experience on feature engineering and model deployment. Analytical & Problem-Solving: Strong troubleshooting and problem-solving skills. Experience with data quality checks and validation. Nice-to-Have Skills: Familiarity with data governance, security, and compliance in hybrid environments. Familiarity with data governance frameworks and compliance practices. Building Managing Large Scale on-prem Hadoop clusters or on Cloud Experience with wide range of range of Big Data tools, databases & frameworks e.g Kafka Connect, Apache Arrow, Apache Iceberg SQL & NoSQL DB: Hive, Cassandra, Dremio, Presto Security & Secrets Management like Ansible Vault, Snyk, Git Secrets, Hashicorp Vault. Infrastructure: Hands on experience on Kubernetes, Docker and related container technologies.

Roles & Responsibilities

Design, build, and maintain scalable ETL/ELT data pipelines using Azure Data Factory, Databricks, and Spark.

Develop and optimize spark data workflows using Scala for large-scale data processing and transformation.

Implement performance tuning and optimization strategies for data pipelines and Spark jobs to ensure efficient data handling.

Collaborate with data engineers to support feature engineering, model deployment, and end-to-end data engineering workflows.

Collaborate with cross-functional teams and stakeholder teams to understand data requirements and deliver high-quality solutions

Ensure data quality and integrity by implementing validation, error-handling, and monitoring mechanisms.

Work with structured and unstructured data using technologies such as Delta Lake and Parquet within a Big Data ecosystem.

Mandatory Skills

SCALA, Pyspark, Azure, Databricks