Search by job, company or skills

Octro Inc.

Octro - Data Engineer - Python/Apache Spark

new job description bg glownew job description bg glownew job description bg svg
  • Posted 13 days ago
  • Be among the first 20 applicants
Early Applicant

Job Description

Description

As a Data Engineer, the candidate will design and maintain scalable data pipelines and analytics systems. The ideal candidate will have 2- 4 years of experience with Apache Spark,Scala/Python, Trino/Presto, Hadoop,kafka and data lake technologies such as Delta Lake.

Experience with Elasticsearch, streaming data, and modern analytics platforms is preferred.

Mandatory Skills Requirements

  • Proficient in Python and/or Scala with strong experience in developing and optimizing data processing applications using Apache Spark.
  • Extensive experience with Apache Spark Structured Streaming for near real-time and streaming data processing.
  • Strong hands-on experience with Apache Kafka, including integration with Spark for reliable real-time data ingestion and event-driven pipelines.
  • Experience working with analytical and distributed data stores such as ClickHouse, Trino/Presto, and data lake technologies (Delta Lake or equivalent).
  • Solid understanding of data modeling and metric design for large-scale analytics systems, including fact/dimension modeling and event-based schemas.
  • Proven ability to design and implement ETL / ELT pipelines for data ingestion, transformation, aggregation, and performance optimization using Spark.
  • Demonstrated experience in writing efficient, scalable, and maintainable code for large-scale data processing workloads.
  • Experience operating in on-prem or hybrid data platforms, with a working understanding of cluster resource management, performance tuning, and capacity planning.
  • Familiarity with Elasticsearch for search, observability, or analytical use cases is a plus.

Preferred Skills Requirements

  • Bachelors degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
  • Strong familiarity with version control systems, particularly Git, and collaborative development workflows.
  • Working knowledge of cloud platforms such as AWS, Azure, or Google Cloud, primarily for data services, storage, or hybrid deployments.
  • Understanding of distributed data systems and database administration principles, including performance tuning, reliability, and scaling of analytical or NoSQL databases (e.g., ClickHouse, Elasticsearch, HBase, or similar

(ref:hirist.tech)

More Info

Job Type:
Industry:
Employment Type:

About Company

Job ID: 142160789