
Search by job, company or skills
We are seeking a highly skilled Senior Data Engineer with 9+ years of 100% hands-on experience building and maintaining enterprise-grade data pipelines. This is a pure Individual Contributor role focused on writing production-quality code, developing scalable ETL/ELT solutions using PySpark and AWS, and orchestrating workflows with Airflow. If you thrive on solving complex technical problems and shipping robust, well-tested code, this role is for you.
Key Responsibilities
. Develop and maintain robust, scalable ETL/ELT pipelines using PySpark on AWS EMR
. Build data ingestion and transformation workflows from diverse sources (S3, EMR, RDS, Kafka, APIs) into AWS-based data lakes and warehouses
. Write clean, modular, testable Python code following best practices and coding standards
. Implement comprehensive unit tests using pytest/unittest with mocking, fixtures, and high code coverage
. Design and build production-grade Airflow DAGs for workflow orchestration, scheduling, and monitoring
. Optimize Spark jobs for performance, memory efficiency, and cost reduction
. Implement CI/CD pipelines for automated testing and deployment using Jenkins, GitHub Actions, or AWS CodePipeline
. Troubleshoot and debug complex data pipeline issues in production environments
. Collaborate with Data Scientists, Analysts, and Platform Engineers to deliver data solutions
. Ensure data quality, security, and compliance standards are met
Required Skills & Qualifications
. 9+ years of hands-on data engineering experience (no management responsibilities required)
. Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
. Expert-level Python programming - OOP, design patterns, clean code practices
. Advanced PySpark/Spark skills - partitioning strategies, shuffle optimization, memory tuning, broadcast joins
. Strong unit testing expertise using pytest/unittest - mocking, parametrized tests, fixtures, TDD mindset
. Hands-on Airflow experience - DAG design, custom operators, sensors, XComs, debugging failed tasks
. Deep AWS experience: S3, EMR, Glue, Redshift, Lambda, Step Functions, IAM, CloudWatch
. Solid understanding of data lake and warehouse architectures (medallion architecture, Delta Lake)
. Strong SQL skills - complex queries, window functions, query optimization
. Proficiency with Git, code reviews, and collaborative development workflows
. Experience with CI/CD pipelines and automated testing frameworks
Nice to Have (Preferred)
. Familiarity with Docker for containerized data workloads
. Exposure to streaming data (Kafka, Spark Streaming)
. Knowledge of data quality frameworks
. Background in financial services or regulated industries
. Understanding of data security and privacy practices (GDPR)
Perks and Benefits for Irisians
Iris provides world-class benefits for a personalized employee experience. These benefits are designed to support financial, health and well-being needs of Irisians for a holistic professional and personal growth. Click to view the benefits.
A strategic partner that transformational leaders can trust to realize the full potential of technology-enabled transformation.As a trusted technology partner, we focus our highly-experienced talent and rightsized teams to develop complex, mission-critical applications and solutions for leading enterprise across financial services, life sciences, including pharmaceutical, CROs and medical devices, manufacturing & logistics and educational services.
Job ID: 145790053
Skills:
Shell scripting, Python
Skills:
Data Modeling, Pyspark, Scala, Kafka, Data Extraction, Sql, Data Quality, Azure ML, Azure Data Factory, Sqoop, Python, Etl, Azure DevOps, Data Pipelines, Airbyte, GCP Cloud Composer, dbt, Vertex AI, Delta Lake, GCP BigQuery, GCP DLP, GCP Cloud Run, Fivetran
Skills:
data engineering , Sql, AWS, Python, Jenkins, Data Lake, Git, Pyspark, Airflow
Skills:
bedrock , Spark, Databricks, Aws S3, Python, Iam, Pyspark, Airflow, MLflow, Textract, Delta Lake
Skills:
snowflake , Azure Data Factory, Sql, Data Governance, Devops, Pyspark, Tableau, Netezza, Azure Functions, Power Bi, Python, Azure Synapse, Metadata Management, DataStage, Azure Data Lake, data catalog tools, dbt, Azure Cloud Platform, CI CD pipelines, SnowSQL
We don’t charge any money for job offers