About the Role:
We are looking for a highly skilled Data Engineer with strong experience in building scalable ETL/ELT pipelines, distributed data systems, and modern lakehouse architectures. The ideal candidate will work on large-scale telecom and CPaaS datasets, including Call Detail Records (CDR), enabling real-time analytics and business intelligence across the platform ecosystem.
What you'll be Responsible for
- Design and implement scalable ETL/ELT pipelines for large-scale analytics, data processing, and migration workloads.
- Build modern data lakehouse platforms using Iceberg, Delta Lake, or Hudi with catalog services like Nessie, AWS Glue, or Hive Metastore.
- Develop and optimize high-performance SQL queries and distributed data processing jobs using Spark (PySpark), Hadoop, and Kafka.
- Design and manage data warehouses and analytical platforms using Snowflake, ClickHouse, Dremio, Redshift, Trino, or Presto.
- Build ingestion and transformation pipelines using object storage systems such as Amazon S3, Azure Data Lake, GCS, or Nutanix Object Storage.
- Process and transform telecom datasets and Call Detail Records (CDR) efficiently at scale.
- Implement orchestration workflows using Airflow, Kestra, or similar workflow engines.
- Ensure data quality, governance, lineage, observability, scalability, and cost optimization across distributed systems.
- Build reusable frameworks for bulk data movement, ingestion acceleration, and transformation at scale.
What you'd have
- 7 –10 years of experience in Data Engineering or related roles.
- Strong expertise in Advanced SQL including query optimization, partitioning, indexing, and performance tuning.
- Hands-on experience with Apache Spark (PySpark), Hadoop, Kafka, and distributed data processing systems.
- Strong expertise in lakehouse technologies such as Iceberg, Delta Lake, or Hudi.
- Experience with metadata/catalog systems including Nessie, Glue, or Hive Metastore.
- Knowledge of analytical engines such as ClickHouse, Dremio, Trino, or Presto.
- Strong understanding of Parquet, ORC, and Avro data formats.
- Experience with object storage systems like S3, ADLS, GCS, or Nutanix Object Storage.
- Strong programming skills in Python / PySpark / Scala.
- Experience with Airflow, Kestra, or similar orchestration tools.
- Hands-on exposure to AWS, Azure, or GCP cloud platforms.
- Experience in telecom data systems or CDR processing is highly preferred.
Why join us
- Impactful Work: Build large-scale data platforms and analytics systems that power real-time communication products used by millions globally.
- Tremendous Growth Opportunities: Accelerate your career by solving complex engineering challenges in a fast-growing CPaaS and product-driven environment.
- Innovative Environment: Work alongside world-class engineers building cutting-edge distributed data systems, lakehouse architectures, and cloud-native platforms.
Tanla is an equal opportunity employer. We champion diversity and are committed to creating an inclusive environment for all employees.
www.Karix.com