a text-decoration: none; color: #464feb; tr th, tr td border: 1px solid #e6e6e6; tr th background-color: #f5f5f5;
Job Description
Use Frameworks Like
- Data Pipeline Development
- Build and maintain ETL/ELT pipelines using Python
- Ingest data from multiple sources (APIs, databases, files, streaming systems)
- Optimize pipelines for performance and scalability
- a text-decoration: none; color: #464feb; tr th, tr td border: 1px solid #e6e6e6; tr th background-color: #f5f5f5; Clean, transform, and validate raw datasets
- Handle structured and unstructured data
- Pandas
- PySpark
- Dask
- Database & Data Warehousing
- SQL (PostgreSQL, MySQL, SQL Server)
- NoSQL (MongoDB, Cassandra)
- Design schemas and optimize queries
- Build data warehouses using:
- Snowflake
- Redshift
- BigQuery
- Big Data Technologies
- Apache Spark
- Hadoop
- Process large-scale datasets efficiently
- Workflow Orchestration
- Apache Airflow
- Cloud Platforms
- Work on cloud environments:
- AWS (S3, Glue, Lambda, EMR)
- Azure (Data Factory, Synapse)
- GCP (Dataflow, BigQuery)
- Data Quality & Monitoring
- Implement data validation checks
- Monitor pipeline failures and fix bugs
- Ensure data reliability and integrity