We are looking for a highly skilledCloud Engineerwith a specialization inApache SparkandDatabricksto join our dynamic team. The ideal candidate will have extensive experience working with cloud platforms such as AWS, Azure, and GCP, and a deep understanding of data engineering, ETL processes, and cloud-native tools. Your primary responsibility will be to design, develop, and maintain scalable data pipelines using Spark and Databricks, while optimizing performance and ensuring data integrity across diverse environments.
Key Responsibilities:
Design and Development:
- Architect, develop, and maintain scalable ETL pipelines usingDatabricks,Apache Spark(Scala, Python), and other cloud-native tools such asAWS Glue,Azure Data Factory, andGCP Dataflow.
- Design and build data lakes and data warehouses on cloud platforms (AWS, Azure, GCP).
- Implement efficient data ingestion, transformation, and processing workflows with Spark and Databricks.
- Optimize the performance of ETL processes for faster data processing and lower costs.
- Develop and manage data pipelines using other ETL tools such asInformatica,SAP Data Intelligence, and others as needed.
Data Integration and Management:
- Integrate structured and unstructured data sources (relational databases, APIs, ERP systems) into the cloud data infrastructure.
- Ensure data quality, validation, and integrity through rigorous testing.
- Perform data extraction and integration fromSAPorERP systems, ensuring seamless data flow.
Performance Optimization:
- Monitor, troubleshoot, and enhance the performance ofSpark/Databrickspipelines.
- Implement best practices for data governance, security, and compliance across data workflows.
Collaboration and Communication:
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to define data requirements and deliver scalable solutions.
- Provide technical guidance and recommendations on cloud data engineering processes and tools.
Documentation and Maintenance:
- Document data engineering solutions, ETL pipelines, and workflows.
- Maintain and support existing data pipelines, ensuring they operate effectively and align with business goals.
Qualifications:
Education:
- Bachelors degree inComputer Science,Information Technology, or a related field. Advanced degrees are a plus.
Experience:
- 7+ yearsof experience in cloud data engineering or similar roles.
- Expertise inApache SparkandDatabricksfor data processing.
- Proven experience with cloud platforms likeAWS,Azure, andGCP.
- Experience with cloud-native ETL tools such asAWS Glue,Azure Data Factory,Kafka,GCP Dataflow, etc.
- Hands-on experience with data platforms likeRedshift,Snowflake,Azure Synapse, andBigQuery.
- Experience in extracting data fromSAPorERP systemsis preferred.
- Strong programming skills inPython,Scala, orJava.
- Proficient inSQLand query optimization techniques.
Skills:
- In-depth knowledge ofSpark/Scalafor high-performance data processing.
- Strong understanding of data modeling, ETL/ELT processes, and data warehousing concepts.
- Familiarity with data governance, security, and compliance best practices.
- Excellent problem-solving, communication, and collaboration skills.
Preferred Qualifications:
- Certifications in cloud platforms (e.g.,AWS Certified Data Analytics,Google Professional Data Engineer,Azure Data Engineer Associate).
- Experience withCI/CD pipelinesandDevOpspractices for data engineering.
- Exposure toApache Hadoop,Kafka, or other data frameworks is a plus.