Octro - Data Engineer - Python/Apache Spark

Octro Inc.

Noida, India

2-4 Years

Save

Posted 13 days ago
Be among the first 20 applicants

Early Applicant

Job Description

Description

As a Data Engineer, the candidate will design and maintain scalable data pipelines and analytics systems. The ideal candidate will have 2- 4 years of experience with Apache Spark,Scala/Python, Trino/Presto, Hadoop,kafka and data lake technologies such as Delta Lake.

Experience with Elasticsearch, streaming data, and modern analytics platforms is preferred.

Mandatory Skills Requirements

Proficient in Python and/or Scala with strong experience in developing and optimizing data processing applications using Apache Spark.
Extensive experience with Apache Spark Structured Streaming for near real-time and streaming data processing.
Strong hands-on experience with Apache Kafka, including integration with Spark for reliable real-time data ingestion and event-driven pipelines.
Experience working with analytical and distributed data stores such as ClickHouse, Trino/Presto, and data lake technologies (Delta Lake or equivalent).
Solid understanding of data modeling and metric design for large-scale analytics systems, including fact/dimension modeling and event-based schemas.
Proven ability to design and implement ETL / ELT pipelines for data ingestion, transformation, aggregation, and performance optimization using Spark.
Demonstrated experience in writing efficient, scalable, and maintainable code for large-scale data processing workloads.
Experience operating in on-prem or hybrid data platforms, with a working understanding of cluster resource management, performance tuning, and capacity planning.
Familiarity with Elasticsearch for search, observability, or analytical use cases is a plus.

Preferred Skills Requirements

Bachelors degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
Strong familiarity with version control systems, particularly Git, and collaborative development workflows.
Working knowledge of cloud platforms such as AWS, Azure, or Google Cloud, primarily for data services, storage, or hybrid deployments.
Understanding of distributed data systems and database administration principles, including performance tuning, reliability, and scaling of analytical or NoSQL databases (e.g., ClickHouse, Elasticsearch, HBase, or similar

(ref:hirist.tech)