Responsibilities
- Design, Architect, and Develop solutions leveraging cloud big data technology to ingest, process and analyze large, disparate data sets to exceed business requirements.
- Design & develop data management and data persistence solutions for application use cases leveraging relational, non-relational databases and enhancing our data processing capabilities.
- Develop POCs to influence platform architects, product managers and software engineers to validate solution proposals and migrate.
- Develop data lake solution to store structured and unstructured data from internal and external sources and provide technical guidance to help migrate colleagues to modern technology platform.
- Contribute and adhere to CI/CD processes, development best practices and strengthen the discipline in Data Engineering Org.
- Develop systems that ingest, cleanse and normalize diverse datasets, develop data pipelines from various internal and external sources and build structure for previously unstructured data.
- Using PySpark and Spark SQL, extract, manipulate, and transform data from various sources, such as databases, data lakes, APIs, and files, to prepare it for analysis and modeling.
- Build and optimize ETL workflows using Azure Databricks and PySpark. This includes developing efficient data processing pipelines, data validation, error handling, and performance tuning.
- Perform the unit testing, system integration testing, regression testing and assist with user acceptance testing.
- Articulates business requirements in a technical solution that can be designed and engineered.
- Consults with the business to develop documentation and communication materials to ensure accurate usage and interpretation of JLL data.
- Implement data security best practices, including data encryption, access controls, and compliance with data protection regulations. Ensure data privacy, confidentiality, and integrity throughout the data engineering processes.
- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
Experience & Education
- Minimum of 4 years of experience as a data developer using Python, PySpark, Spark Sql, ETL knowledge, SQL Server, ETL Concepts.
- Bachelors degree in Information Science, Computer Science, Mathematics, Statistics or a quantitative discipline in science, business, or social science.
- Experience in Azure Cloud Platform, Databricks, Azure storage.
- Effective written and verbal communication skills, including technical writing.
- Excellent technical, analytical and organizational skills.
Technical Skills & Competencies
- Experience handling un-structured, semi-structured data, working in a data lake environment, leveraging data streaming and developing data pipelines driven by events/queues
- Hands on Experience and knowledge on real time/near real time processing and ready to code
- Hands on Experience in PySpark, Databricks, and Spark Sql.
- Knowledge on json, Parquet and Other file format and work effectively with them
- No Sql Databases Knowledge like Hbase, Mongo, Cosmos etc.
- Preferred Cloud Experience on Azure or AWS
- Python-spark, Spark Streaming, Azure SQL Server, Cosmos DB/Mongo DB, Azure Event Hubs, Azure Data Lake Storage, Azure Search etc.
- Team player, Reliable, self-motivated, and self-disciplined individual capable of executing on multiple projects simultaneously within a fast-paced environment working with cross functional teams.