Data Engineer

TEKsystems Global Services in India

Hyderabad, India

5-8 Years

Save

Posted 3 days ago
Be among the first 10 applicants

Early Applicant

Job Description

About the Company

We are looking for a hands-on Data Engineer with strong expertise in Python, PySpark, and cloud data services (AWS and/or Azure) to design, build, and optimize scalable data pipelines and lakehouse solutions. You'll work closely with data architects, analysts, and product teams to deliver high-quality, reliable, and secure data for analytics, AI/ML, and reporting use cases.

About the Role

Responsibilities

Design, build, and maintain batch and streaming data pipelines using PySpark/Spark and Python.
Develop data lake/lakehouse architectures including Delta Lake/Iceberg/Hudi where applicable.
Orchestrate pipelines using tools like Airflow, AWS Step Functions, Azure Data Factory, or Databricks Workflows.
Build and optimize ETL/ELT workflows for large-scale datasets with a focus on performance, reliability, and cost.
Implement data quality (DQ) checks, observability/monitoring, and error handling.
Collaborate on data modeling (star/snowflake), CDC, SCD, and partitioning/bucketing strategies.
Enforce security best practices: IAM/roles, encryption, secrets management, and data governance.
Contribute to CI/CD for data code (e.g., Git, Azure DevOps, GitHub Actions, Jenkins) and infra-as-code (Terraform/CloudFormation/Bicep).
Partner with stakeholders to translate business needs into scalable technical solutions; document designs and runbooks.

Qualifications

Bachelor's or Master's degree in Computer Science, Information Systems, Engineering, or equivalent experience.
Relevant certifications are a plus (e.g., AWS Data Analytics Specialty, AWS Developer/Architect, Azure Data Engineer Associate (DP-203), Databricks Data Engineer Associate/Professional).

Required Skills

58 years of professional experience as a Data Engineer or similar.
Strong programming in Python (data processing, packaging, unit testing, typing).
Advanced PySpark/Spark: RDD/DataFrame APIs, Spark SQL, performance tuning (joins, shuffle, partitions, broadcast, caching).
Cloud (AWS and/or Azure) experience (at least one end-to-end project):
AWS: S3, Glue, EMR, Lambda, Athena, Redshift (or Spectrum), Step Functions, IAM, CloudWatch/CloudTrail, Kinesis (nice to have).
Azure: ADLS Gen2, Databricks, Synapse (Spark/SQL), ADF, Azure Functions, Event Hub, Key Vault, Purview (nice to have).
Databricks (or Spark on EMR/Synapse): notebooks, jobs, clusters, Delta Lake, Unity Catalog (preferred).
Data modeling & SQL (complex queries, performance optimization).
Orchestration: Airflow/ADF/Step Functions/Databricks Jobs.
Version control (Git) and CI/CD for data projects.
Solid understanding of data quality, lineage, metadata, and observability concepts.
Experience with cost optimization and security on cloud data platforms.

Preferred Skills

Streaming: Spark Structured Streaming, Kafka/Event Hubs/Kinesis.
Infra-as-Code: Terraform/CloudFormation/Bicep.
Containers: Docker; basics of Kubernetes (AKS/EKS) a plus.
Warehouse/Lakehouse: Redshift, Snowflake, Synapse SQL Pools.
Testing: Great Expectations, dbt tests (if dbt is used), pytest.
ML Pipelines: Feature engineering pipelines feeding ML (MLOps exposure is beneficial).
Compliance/Governance: GDPR/PII handling, masking, tokenization.

Pay range and compensation package

Location: Domlur, Bangalore

Work Mode: 4 days WFO, 1 day WFH (as per policy/project needs)

Interview Availability: As per schedule shared by the TA team

Equal Opportunity Statement

We are committed to diversity and inclusivity.