Key Responsibilities:
Data Engineering & Architecture:
- Design, develop, and maintain high-performance data pipelines for structured and unstructured data using Azure Data Bricks and Apache Spark.
- Build and manage scalable data ingestion frameworks for batch and real-time data processing.
- Implement and optimize data lake architecture in Azure Data Lake to support analytics and reporting workloads.
- Develop and optimize data models and queries in Azure Synapse Analytics to power BI and analytics use cases.
Cloud-Based Data Solutions:
- Architect and implement modern data lakehouses combining the best of data lakes and data warehouses.
- Leverage Azure services like Data Factory, Event Hub, and Blob Storage for end-to-end data workflows.
- Ensure security, compliance, and governance of data through Azure Role-Based Access Control (RBAC) and Data Lake ACLs.
ETL/ELT Development:
- Develop robust ETL/ELT pipelines using Azure Data Factory, Data Bricks notebooks, and PySpark.
- Perform data transformations, cleansing, and validation to prepare datasets for analysis.
- Manage and monitor job orchestration, ensuring pipelines run efficiently and reliably.
Performance Optimization:
- Optimize Spark jobs and SQL queries for large-scale data processing.
- Implement partitioning, caching, and indexing strategies to improve performance and scalability of big data workloads.
- Conduct capacity planning and recommend infrastructure optimizations for cost-effectiveness.
Collaboration & Stakeholder Management:
- Work closely with business analysts, data scientists, and product teams to understand data requirements and deliver solutions.
- Participate in cross-functional design sessions to translate business needs into technical specifications.
- Provide thought leadership on best practices in data engineering and cloud computing.
Documentation & Knowledge Sharing:
- Create detailed documentation for data workflows, pipelines, and architectural decisions.
- Mentor junior team members and promote a culture of learning and innovation.
Required Qualifications:
Experience:
- 7+ years of experience in data engineering, big data, or cloud-based data solutions.
- Proven expertise with Azure Data Bricks, Azure Data Lake, and Azure Synapse Analytics.
Technical Skills:
- Strong hands-on experience with Apache Spark and distributed data processing frameworks.
- Advanced proficiency in Python and SQL for data manipulation and pipeline development.
- Deep understanding of data modeling for OLAP, OLTP, and dimensional data models.
- Experience with ETL/ELT tools like Azure Data Factory or Informatica.
- Familiarity with Azure DevOps for CI/CD pipelines and version control.
- Big Data Ecosystem:
- Familiarity with Delta Lake for managing big data in Azure.
- Experience with streaming data frameworks like Kafka, Event Hub, or Spark Streaming.
- Cloud Expertise:
- Strong understanding of Azure cloud architecture, including storage, compute, and networking.
- Knowledge of Azure security best practices, such as encryption and key management.
Preferred Skills (Nice to Have):
- Experience with machine learning pipelines and frameworks like MLFlow or Azure Machine Learning.
- Knowledge of data visualization tools such as Power BI for creating dashboards and reports.
- Familiarity with Terraform or ARM templates for infrastructure as code (IaC).
- Exposure to NoSQL databases like Cosmos DB or MongoDB.
Location:
IND:AP:Hyderabad / Argus Bldg, Sattva, Knowledge City - Adm: Argus Building, Sattva, Knowledge City