Dear Candidate,
We are Hiring for Azure Databrick for Hyderabad Location.
Please help me with below detail along with you update resume.
Note Screenselected resumes are Scheduled
Name
Contact Number
Email ID
Current Location
Preferred Location
Total Experience
Relevant Experience
Current Organisation
CurrentCTC
Expected
Notice Period
Timing
Date
Experience: 5+ Years
Location: [Onsite/Hybrid/Remote]
Employment Type: Full-time
Role Overview
We're seeking a detail-oriented L1 Data Engineering (Databricks) Engineer to provide first-line support for data pipelines and jobs running on Databricks. You will be responsible for monitoring, incident response, job reruns, SLA adherence, and operational hygiene. The ideal candidate has solid exposure to PySpark, Delta Lake, Azure Data Lake Storage, and Databricks Workflows/Jobs, with a strong focus on stability and service continuity.
Key Responsibilities
- Monitor & Support Pipelines:
- Monitor Databricks Jobs/Workflows, ADX/ADF pipeline runs, and streaming sources.
- Proactively detect failures, lag, and SLA breaches; perform first-line triage.
- Incident Management:
- Acknowledge incidents, classify severity (SEV levels), and follow ITIL-based process.
- Execute runbooks, perform safe reruns, and handle partial reprocessing.
- Escalate to L2/L3 with detailed incident documentation (logs, job IDs, inputs/outputs).
- Operational Tasks:
- Validate data availability and quality at key checkpoints (bronze/silver/gold layers).
- Manage ad hoc fixes (e.g., small Spark config changes, partition reruns).
- Maintain metadata, service accounts, and token rotations as per SOPs.
- Access & Governance:
- Basic administration in Databricks: cluster start/stop, job scheduling checks, permissions requests, workspace hygiene.
- Work with Unity Catalog permissions and Key Vault integrations under guidance.
- Documentation & Reporting:
- Update runbooks, Known Error Database (KEDB), and SOPs.
- Publish daily/weekly ops reports, SLA metrics, and post-incident summaries.
- Collaboration:
- Coordinate with Data Engineers, Platform Engineers, Security, and Product Teams.
- Participate in release readiness and operational acceptance for new pipelines.
Required Skills & Qualifications
- 5+ years in data operations/support (L1/L1.5), preferably on Databricks + Azure.
- Hands-on with:
- Databricks Jobs/Workflows, Clusters, Repos, Delta Lake basics.
- Azure Data Lake Storage (ADLS), Azure Data Factory (ADF) monitoring.
- PySpark basics: reading/writing Delta/Parquet, partitioning, checkpoints.
- Git (GitHub/Azure Repos) for config, notebooks, and runbook versioning.
- Strong grasp of observability:
- Reading Spark UI, job logs, driver/executor logs.
- Experience with Log Analytics, Azure Monitor, App Insights (preferred).
- ITIL/Service Management familiarity: Incident, Change, Problem, Knowledge.
- Scripting (Bash/PowerShell/Python) for small automation tasks.
- Excellent communication, documentation, and shift handover discipline.
Nice-to-Have
- Exposure to Structured Streaming, Kafka/Event Hub monitoring.
- Basic SQL for data validation and health checks.
- Understanding of Unity Catalog data governance, lineage, and entitlement workflows.
- Experience with Secrets/Key Vault, Managed Identity, and RBAC.
- Experience with CI/CD for Databricks (GitHub Actions/Azure DevOps) for deployments.
Certifications (Preferred)
- Databricks Certified Data Engineer Associate
- Microsoft Certified: Azure Data Engineer Associate (DP-203)
- ITIL Foundation (v3/v4)
Key Performance Indicators (KPIs)
- SLA Adherence: % of jobs meeting SLA; mean time to acknowledge (MTTA).
- Incident Metrics: Mean time to resolve (MTTR), incident reopen rate.
- Operational Hygiene: Runbook completeness, KEDB updates, shift handover quality.
- Quality Metrics: Error rates, number of successful reruns without escalation.
- Proactive Monitoring: Number of issues prevented via early detection/alerts.
- Change Readiness: Zero-defect deployments from ops perspective.
Tools & Ecosystem
- Databricks (Jobs, Clusters, Repos, Workflows, Unity Catalog)
- Azure: ADLS, ADF, Key Vault, Event Hub, Log Analytics, Monitor
- Version Control: GitHub / Azure Repos
- Ticketing: ServiceNow / Jira / Azure DevOps Boards
- Observability: Azure Monitor, Log Analytics, Grafana (optional)
Sample Interview Screening Topics (for Hiring Teams)
- Ops Scenarios: How to triage a failing Databricks job (OOM, shuffle spill, auth error).
- Logs & Spark UI: Identify cause from executor logs; interpret stages/tasks/shuffles.
- Data Validation: Checkpoint integrity, bronzesilver load verification.
- Runbooks: Steps to safely rerun a partitioned pipeline without duplicate writes.
- Access/Governance: Handling a permission issue with Unity Catalog tables.
- SLA & Escalation: When to escalate vs. when to rerun; SEV classification.
JD Summary (Short Version for Job Portals)
Role: L1 Data Engineering (Databricks) Engineer 5+ years
Must-have: Databricks Ops, ADF monitoring, ADLS, PySpark basics, ITIL, incident management
Nice-to-have: Unity Catalog, Azure Monitor, CI/CD, streaming
Shift: Rotational/on-call
Certs: Databricks DE Associate, DP-203, ITIL Foundation (preferred)
Durga Karunakaran
TAG Team - HCL Technologies Ltd.
[Confidential Information]
Chennai, India
Durga Karunakaran | LinkedIn