Require an experienced Data Engineer who can design, develop and maintain complex data engineering pipelines for structured & semi structured data
a. Implement and maintain distributed database systems using Citus to extend PostgreSQL for large-scale data handling.
b. Integrate and manage data services and tools using REST APIs, enhancing system interoperability and data exchange.
c. Utilize Azure Kubernetes Service (AKS) for orchestrating containerized data applications, ensuring scalability and reliability.
d. Strong understanding of Git and CI/CD practices is crucial for this role.
e. Employ best practices in continuous integration and delivery with an understanding of Azure DevOps tools (if applicable).
f. Troubleshoot and resolve data processing issues and perform root cause analysis to prevent future occurrences.
g. Collaborate with cross-functional teams to integrate systems and ensure consistent data flow.
h. Document all processes, models, and activities to ensure clarity and compliance with company standards.
Key Responsibilities: a. Develop, build, and sustain scalable and high-performance data pipelines utilizing Spark, Scala, and Python.
b. Develop and manage Databricks workflows, with a focus on performance optimization and efficient cluster utilization.
c. Manage and utilize Delta tables for improved data storage.
d. Efficiently handle various Big Data file formats including Parquet, ORC, and JSON to ensure compatibility and performance across different data storage solutions.