Experience with AWS cloud data platform services: EC2, EMR, RDS, Redshift, Glue
Experience with object-oriented scripting languages: Python, Pyspark
Experience with data pipeline and workflow management tools like Airflow.
Experience with relational SQL and NoSQL databases, including MongoDB.
Experience with stream-processing systems
Spark-Streaming, Kinesis etc.
Experience with frameworks like Iceberg and Delta Tables.
Responsibility:
Experience building and optimizing ETL data pipelines to ingest, transform and load the datasets. Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Build processes supporting data transformation, data structures, metadata, data quality and workload management.
Experience in processing and extracting value from large datasets.