Search by job, company or skills

Google

Apache Spark/Airflow Data Engineer

Fresher
new job description bg glownew job description bg glownew job description bg svg
  • Posted 3 hours ago
  • Be among the first 10 applicants
Early Applicant

Job Description

Job Description

We are looking for a skilled Data Engineer with strong experience in Apache Spark to design, build, and optimize large-scale data pipelines in a distributed environment. The ideal candidate has hands-on expertise in modern data engineering practices, cloud platforms, and scalable data processing frameworks.

Key Responsibilities

  • Design, develop, and maintain ETL/ELT pipelines using Apache Spark (batch and/or streaming).
  • Build and optimize distributed data processing workflows on Spark (PySpark/Scala/Java).
  • Work with cloud-based data ecosystems (AWS, GCP, or Azure) to develop scalable data solutions.
  • Collaborate with data scientists, analysts, and backend engineers to deliver reliable, highquality data products.
  • Implement and maintain data quality checks, monitoring, and alerting for data pipelines.
  • Optimize Spark jobs for performance, cost efficiency, and scalability.
  • Manage and model data in data lakes, data warehouses, and/or structured storage systems.
  • Contribute to data architecture design, including schema modeling, partitioning, and data lifecycle management.
  • Automate infrastructure and pipeline deployments using CI/CD and IaC frameworks.
  • Ensure compliance with data governance, security, and privacy standards.

Required Skills & Qualifications

  • Strong hands-on experience with Apache Spark (batch or streaming).
  • Proficiency in Python, Scala, or Java for data processing.
  • Experience with at least one cloud platform (AWS, GCP, or Azure).
  • Solid understanding of distributed systems, data partitioning, and performance tuning.
  • Hands-on experience with data lake technologies (e.g., S3, GCS, Azure Data Lake).
  • Experience with relational databases and SQL.
  • Familiarity with CI/CD workflows and version control (Git).
  • Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, etc.) is a plus.
  • Knowledge of workflow orchestration tools such as Airflow, Dagster, or Prefect.
  • Strong problemsolving skills and ability to work in crossfunctional teams.

Preferred Qualifications (Optional)

  • Experience with Spark on Kubernetes, Databricks, EMR, or Dataproc.
  • Knowledge of streaming technologies (Kafka, Pub/Sub, Kinesis).
  • Familiarity with Delta Lake, Iceberg, or Hudi.
  • Background in data modeling (ELT/ETL design, star/snowflake schemas).
  • Experience with realtime and nearrealtime data pipelines.

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 144521535