A minimum of 6-9 years of relevant experience in data science, with a strong portfolio showcasing your expertise in machine learning, deep learning, GANs, synthetic data algorithms.
Advanced proficiency in AWS cloud services (e. g., S3 EC2 Lambda, SageMaker, EKS) and a solid understanding of cloud architecture.
Expert-level knowledge in Python and its data science libraries (TensorFlow, PyTorch, Scikit-learn, Pandas).
Experience in building and optimizing data pipelines, architectures, and data sets, particularly in a real-time, SaaS environment.
Design, develop, and deploy data pipelines and ETL processes
Implement data integration solutions, ensuring data flows efficiently and reliably between various data sources and destinations.
Collaborate with data architects and analysts to understand data requirements and translate them into technical specifications.
Build and maintain scalable and optimized data storage solutions
Develop and manage data transformation and cleansing processes to ensure data quality and accuracy.
Monitor and troubleshoot data pipelines to identify and resolve issues in a timely manner.
Optimize data pipelines for performance, cost, and scalability
Exceptional analytical, problem-solving, and project management skills.
Good communication skills, with the ability to articulate complex technical concepts to non-technical stakeholders.
A self-starter attitude with the ability to work in a fast-paced and startup environment.
Critical Skills To Possess
Proven experience as a Data Engineer, with a strong focus on cloud technologies.
Solid understanding of data modeling, database design principles, and data warehousing concepts.
Experience with data integration, ETL processes, and data transformation using tools like Azure Data Factory, Azure Logic Apps, or similar technologies.
Strong programming skills in languages such as SQL, Python, or Spark.
Familiarity with version control systems (e.g., Git) and CI/CD pipelines for code deployment.
Knowledge of data security and compliance practices