L1 Support - Azure databrick

HCLTech

Hyderabad, India

5-7 Years

Save

Posted 11 hours ago
Be among the first 10 applicants

Early Applicant

Job Description

Dear Candidate,

We are Hiring for Azure Databrick for Hyderabad Location.

Please help me with below detail along with you update resume.

Note Screenselected resumes are Scheduled

Name

Contact Number

Email ID

Current Location

Preferred Location

Total Experience

Relevant Experience

Current Organisation

CurrentCTC

Expected

Notice Period

Timing

Date

Experience: 5+ Years

Location: [Onsite/Hybrid/Remote]

Employment Type: Full-time

Role Overview

We're seeking a detail-oriented L1 Data Engineering (Databricks) Engineer to provide first-line support for data pipelines and jobs running on Databricks. You will be responsible for monitoring, incident response, job reruns, SLA adherence, and operational hygiene. The ideal candidate has solid exposure to PySpark, Delta Lake, Azure Data Lake Storage, and Databricks Workflows/Jobs, with a strong focus on stability and service continuity.

Key Responsibilities

Monitor & Support Pipelines:
Monitor Databricks Jobs/Workflows, ADX/ADF pipeline runs, and streaming sources.
Proactively detect failures, lag, and SLA breaches; perform first-line triage.
Incident Management:
Acknowledge incidents, classify severity (SEV levels), and follow ITIL-based process.
Execute runbooks, perform safe reruns, and handle partial reprocessing.
Escalate to L2/L3 with detailed incident documentation (logs, job IDs, inputs/outputs).
Operational Tasks:
Validate data availability and quality at key checkpoints (bronze/silver/gold layers).
Manage ad hoc fixes (e.g., small Spark config changes, partition reruns).
Maintain metadata, service accounts, and token rotations as per SOPs.
Access & Governance:
Basic administration in Databricks: cluster start/stop, job scheduling checks, permissions requests, workspace hygiene.
Work with Unity Catalog permissions and Key Vault integrations under guidance.
Documentation & Reporting:
Update runbooks, Known Error Database (KEDB), and SOPs.
Publish daily/weekly ops reports, SLA metrics, and post-incident summaries.
Collaboration:
Coordinate with Data Engineers, Platform Engineers, Security, and Product Teams.
Participate in release readiness and operational acceptance for new pipelines.

Required Skills & Qualifications

5+ years in data operations/support (L1/L1.5), preferably on Databricks + Azure.
Hands-on with:
Databricks Jobs/Workflows, Clusters, Repos, Delta Lake basics.
Azure Data Lake Storage (ADLS), Azure Data Factory (ADF) monitoring.
PySpark basics: reading/writing Delta/Parquet, partitioning, checkpoints.
Git (GitHub/Azure Repos) for config, notebooks, and runbook versioning.
Strong grasp of observability:
Reading Spark UI, job logs, driver/executor logs.
Experience with Log Analytics, Azure Monitor, App Insights (preferred).
ITIL/Service Management familiarity: Incident, Change, Problem, Knowledge.
Scripting (Bash/PowerShell/Python) for small automation tasks.
Excellent communication, documentation, and shift handover discipline.

Nice-to-Have

Exposure to Structured Streaming, Kafka/Event Hub monitoring.
Basic SQL for data validation and health checks.
Understanding of Unity Catalog data governance, lineage, and entitlement workflows.
Experience with Secrets/Key Vault, Managed Identity, and RBAC.
Experience with CI/CD for Databricks (GitHub Actions/Azure DevOps) for deployments.

Certifications (Preferred)

Databricks Certified Data Engineer Associate
Microsoft Certified: Azure Data Engineer Associate (DP-203)
ITIL Foundation (v3/v4)

Key Performance Indicators (KPIs)

SLA Adherence: % of jobs meeting SLA; mean time to acknowledge (MTTA).
Incident Metrics: Mean time to resolve (MTTR), incident reopen rate.
Operational Hygiene: Runbook completeness, KEDB updates, shift handover quality.
Quality Metrics: Error rates, number of successful reruns without escalation.
Proactive Monitoring: Number of issues prevented via early detection/alerts.
Change Readiness: Zero-defect deployments from ops perspective.

Tools & Ecosystem

Databricks (Jobs, Clusters, Repos, Workflows, Unity Catalog)
Azure: ADLS, ADF, Key Vault, Event Hub, Log Analytics, Monitor
Version Control: GitHub / Azure Repos
Ticketing: ServiceNow / Jira / Azure DevOps Boards
Observability: Azure Monitor, Log Analytics, Grafana (optional)

Sample Interview Screening Topics (for Hiring Teams)

Ops Scenarios: How to triage a failing Databricks job (OOM, shuffle spill, auth error).
Logs & Spark UI: Identify cause from executor logs; interpret stages/tasks/shuffles.
Data Validation: Checkpoint integrity, bronzesilver load verification.
Runbooks: Steps to safely rerun a partitioned pipeline without duplicate writes.
Access/Governance: Handling a permission issue with Unity Catalog tables.
SLA & Escalation: When to escalate vs. when to rerun; SEV classification.

JD Summary (Short Version for Job Portals)

Role: L1 Data Engineering (Databricks) Engineer 5+ years

Must-have: Databricks Ops, ADF monitoring, ADLS, PySpark basics, ITIL, incident management

Nice-to-have: Unity Catalog, Azure Monitor, CI/CD, streaming

Shift: Rotational/on-call

Certs: Databricks DE Associate, DP-203, ITIL Foundation (preferred)

Durga Karunakaran

TAG Team - HCL Technologies Ltd.

[Confidential Information]

Chennai, India

Durga Karunakaran | LinkedIn