Senior ML Data Pipeline Engineer

Calix

Bengaluru, India

5-7 Years

Save

Posted a day ago
Be among the first 10 applicants

Early Applicant

Job Description

Calix provides the cloud, software platforms, systems and services required for communications service providers to simplify their businesses, excite their subscribers and grow their value.

Job Description: As a Senior Data Engineer, you will play a critical role in designing, building, and maintaining scalable data architectures that support machine learning models and analytics. You will work closely with data scientists, machine learning engineers, and other stakeholders to ensure that our data infrastructure meets the demands of our ML initiatives. Your expertise will drive the development of eBicient data pipelines, enhance data quality, and optimize performance for our ML workloads.

Key Responsibilities:

Design, develop, and maintain scalable data pipelines and ETL processes to support machine learning/GenAI applications.
Architect, implement, and optimize workflow orchestration using Apache Airflow and Google Cloud Composer, ensuring robust scheduling, monitoring, and management of complex data pipelines.
Collaborate with data scientists and ML engineers to understand data requirements and ensure seamless integration of data into ML models.
Implement data governance practices to ensure data quality, consistency, and compliance with industry standards.
Optimize data storage solutions for performance and cost-eBectiveness, utilizing cloud services (GCP).
Monitor and troubleshoot data pipeline performance issues, implementing improvements as necessary.
Develop and maintain documentation for data architecture, data flows, and processes to support knowledge sharing within the team.
Stay updated on the latest data engineering and machine learning trends, tools, and best practices to continuously improve our processes and technologies.
Mentor and provide guidance to junior data engineers and other team members.
Perform data ingestion, data processing and feature engineering tasks.
Operate and administration of production DB: SQL, NoSQL, Vector and Graph.

Qualifications:

Bachelor's or master's degree in computer science, Engineering, Data Science, or a related field.
5+ years of experience as a Data Engineer, with a focus on machine learning applications.
Strong programming skills in Python or Java.
Strong experience with data processing frameworks (e.g., Apache Spark, Apache Kafka) and ETL tools.
Solid understanding of databases (SQL and NoSQL) and data warehousing solutions (e.g., Amazon Redshift, Google BigQuery).
Experience with real-time data processing and streaming analytics.
Experience with containerization and orchestration tools (Docker, Kubernetes)
Experience with cloud platforms (AWS, Azure, GCP) and their data services.
Experience with version control systems (e.g., Git) and CI/CD pipelines.
Knowledge of machine learning concepts and algorithms is a plus.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Strong communication skills, with an ability to convey complex technical concepts to non-technical stakeholders.

Preferred Qualifications: