
Search by job, company or skills
Strong hands-on experience in Python, leveraging PySpark, Scikit-learn, TensorFlow, PyTorch, Pandas, PyArrow, and related tools for large-scale data transformation and automation
Design, develop, and optimize distributed data pipelines using Apache Spark (PySpark)
Deploy and manage Spark workloads on AWS EMR (cluster sizing, autoscaling, performance tuning)
Implement centralized data governance and analytics solutions on AWS, including AWS Glue Data Catalog, EMR, Athena, and Glue Jobs
Develop ETL/ELT workflows using AWS Glue (Glue Jobs, Crawlers, Data Catalog)
Orchestrate data workflows using Step Functions, Airflow, or Glue Workflows
Strong understanding of business-layer modeling, data architecture, and batch/real-time data processing
Hands-on with AWS services: EC2, S3, CloudFront, API Gateway, Lambda, RDS/PostgreSQL, IAM roles
Kubernetes (EKS, Helm) and containerized application management
Experience developing SPA using Angular with TypeScript or comparable JS frameworks
Optional: DevOps tools (Git, GitLab, CloudFormation, Terraform), Docker, Kafka streaming pipelines, Delta Lake/Iceberg/Hud
Job ID: 144988595