
Search by job, company or skills
Job Description for Senior Data Engineer
Experience : 4years to 8years
Required Skills : Aws,Python,Pyspark,Databricks
Notice Period : Immediate to 15days
Databricks (Spark)
Develop scalable ETL/ELT pipelines using PySpark (RDD/DataFrame APIs), Delta Lake, Auto Loader (cloudFiles), and Structured Streaming.
Optimize jobs: partitioning, bucketing, Z-Ordering, OPTIMIZE + VACUUM, broadcast joins, AQE, checkpointing.
Manage Unity Catalog: catalogs/schemas/tables, data lineage, permissions, secrets, tokens, and cluster policies.
CI/CD for Databricks assets: notebooks, Jobs, Repos, MLflow artifacts.
Build Medallion Architecture (Bronze/Silver/Gold) with Delta Live Tables (DLT) and expectations for data quality.
Event-driven ingestion: Kafka/Kinesis Databricks Streaming
Snowflake (DW & ELT)
Model and implement star/snowflake schemas, data marts, and secure views.
Performance tuning: clustering keys, micro-partitions, result caching, warehouse sizing, query profile analysis.
Implement Task/Stream patterns for CDC; external tables for data lakes (S3); Snowpipe for near-real-time ingestion.
Python/Snowpark for transformations and UDFs; SQL best practices (CTEs, window functions).
Security: Row Level Security (RLS), Column Masking, OAuth/SCIM, network policies, data sharing (reader accounts).
AWS Data Engineering
Storage & compute: S3 (lifecycle, encryption, partitioning), EMR (if needed), Lambda, Glue (ETL/Schema registry), Athena, Kinesis (Data Streams/Firehose), RDS/Aurora, Step Functions.
Orchestration: MWAA/Airflow or Step Functions (error handling, retries, backfills, SLA alerts).
Infra-as-code: Terraform/CloudFormation for reproducible environments (Databricks workspace, IAM, S3, networking).
Security/compliance: IAM least privilege, KMS, VPC endpoints/private links, Secrets Manager, CloudTrail/CloudWatch, GuardDuty.
Observability: CloudWatch metrics/logs, structured logging, datadog/Prometheus (optional), cost monitoring (tags/budgets).
Data Quality, Governance & Security
Implement unit/integration tests for pipelines (e.g., pytest + Great Expectations + DLT expectations).
Data contracts and schema evolution; monitor SLA/SLO; DQ dashboards (missingness, drift, freshness, completeness).
PII handling: tokenization/pseudonymization, field-level encryption, KYB/KYC data flows adherence; audit trails.
Cataloging & lineage through Unity Catalog and/or OpenLineage/Purview (if applicable).
DevOps & CI/CD
Git workflows (branching, PR reviews), Databricks CLI/Terraform modules for jobs/clusters/UC, Snowflake DevOps (object versioning via schemachange or SQL-based migration).
Automated testing in pipelines; feature flags, canary releases for data jobs; rollback strategies.
Client-Facing PoCs & Delivery
Rapid PoC build: clearly defined success metrics, benchmark cost/performance, produce a transition plan to production.
Present architectural decisions, trade-offs (Spark vs Snowflake ELT), and cost projections (Databricks DBU, Snowflake credits, storage egress).
Produce runbooks, operational playbooks, and knowledge transfer documents for client teams.
Required Technical Skillset
Databricks: PySpark, Delta Lake, Auto Loader, DLT, Jobs, Unity Catalog, MLflow basics.
Snowflake: SQL, Snowpipe, Tasks/Streams, Snowpark (Python), warehouse sizing, performance tuning, security policies.
Python: strong in packages for DE (pandas, pyarrow, pytest), robust error handling, typing, and packaging.
Orchestration: Airflow DAGs (Sensors, Operators, XCom), Step Functions state machines.
Streaming & CDC: Kafka/Kinesis, Debezium (nice-to-have), CDC patterns to Delta/Snowflake.
AWS: S3, Glue, Lambda, Kinesis, IAM/KMS, VPC, CloudWatch; Terraform/CloudFormation.
Data Modeling: 3NF/Dimensional, slowly changing dimensions (SCD Type 2), surrogate keys, surrogate vs natural debates.
Security & Compliance: encryption at rest/in transit, tokenization, key rotation, audit logging, governance controls.
Performance & Cost: Spark job tuning, Snowflake warehouse right-sizing, partitioning/clustering, object storage best practices.
Nice-to-Have:
dbt (Snowflake) with tests & exposures; Great Expectations.
Databricks SQL Warehouses and BI connectivity; Photon engine awareness.
Lakehouse Federation (UC external locations); Delta Sharing; Iceberg experience.
Kafka Connect/Debezium, NiFi or MuleSoft (for data integrations).
Experience in financial services
Exposure to ISO/IEC 27001 controls in data platforms.
Education & Certifications
Bachelor's/Master's in CS/IT/EE or related.
Certifications (plus): Databricks Data Engineer Associate/Professional, Snowflake SnowPro Core/Advanced, AWS Solutions Architect/Big Data/DP.
Job ID: 145057907