Essential Responsibilities:
- As a Lead or Principal Data Engineer, your responsibilities will include:
- Building, refining, tuning, and maintaining our real-time and batch data infrastructure
- Daily use technologies such as HDFS, Spark, Snowflake, Hive, HBase, Scylla, Django, FastAPI, etc.
- Maintaining data quality and accuracy across production data systems
- Working with Data Engineers to optimize data models and workflows
- Working with Data Analysts to develop ETL processes for analysis and reporting
- Working with Product Managers to design and build data products
- Working with our DevOps team to scale and optimize our data infrastructure
- Participate in architecture discussions, influence the road map, take ownership and responsibility over new
- projects
- Participating in 24/7 on-call rotation (be available by phone or email in case something goes wrong)
- Desired Characteristics:
- Minimum 7 years of software engineering experience.
- Proven long term experience and enthusiasm for distributed data processing at scale, eagerness to learn
- new things.
- Expertise in designing and architecting distributed low latency and scalable solutions in either cloud and onpremises environment.
- Exposure to the whole software development lifecycle from inception to production and monitoring
- Fluency in Python or solid experience in Scala, Java
- Proficient with relational databases and Advanced SQL
- Expert in usage of services like Spark, HDFS, Hive, HBase
- Experience in adequate usage of any scheduler such as Apache Airflow, Apache Luigi, Chronos etc.
- Experience in adequate usage of cloud services (AWS) at scale
- Experience in agile software development processes
- Excellent interpersonal and communication skills
Nice to have:
- Experience with large scale / multi-tenant distributed systems
- Experience with columnar / NoSQL databases Vertica, Snowflake, HBase, Scylla, Couchbase
- Experience in real team streaming frameworks Flink, Storm
- Experience with web frameworks such as Flask, Django