
Search by job, company or skills
You'll fix production issues with engineering teams, researchers, data scientists, including performance and functional issues
Diagnose and solve customer technical problems
Participate in training customers and prepare reports on customer issues
Be responsible for customer service improvements and recommend product improvements
Write support documentation
You'll design and implement zero-downtime to monitor and accomplish a highly
available service
As a support engineer, find opportunities to automate as part of the problem
management process, creating automation to avoid issues.
Define engineering excellence for operational maturity
You'll work together with AI platform developers to provide the CI/CD model to
deploy and configure the production system automatically
Develop and follow operational standard processes for tools and automation
development. Including: Style guides, versioning practices, source control,
branching and merging patterns and advising other engineers on development
standards
Deliver solutions that accelerate the activities, phenomenal engineers would
perform through automation, deep domain expertise, and knowledge sharing
Requirements
Demonstrated ability in designing, building, refactoring and releasing software
written in Python, C++.
Good to have experience with Ray.io, including workload management, cluster
deployment, distributed task scheduling, and troubleshooting.
Ability to use Ray Dashboard and CLI tools for monitoring, resource tracking,
debugging distributed jobs, and resolving production issues.
Having knowledge of Ray ecosystem libraries such as Ray Train, Ray Tune, Ray Serve,
and Ray Data is a big plus.
Experience integrating Ray with tools such as Airflow, MLflow, Dask, DeepSpeed is a
big plus.
Debugging and triaging skills.
Cloud technologies like Kubernetes, Docker and Linux fundamentals.
Familiar with DevOps practices and continuous testing.
DevOps pipeline and automations: app deployment/configuration & performance
monitoring.
Test automations, Jenkins CI/CD.
Excellent communication, presentation, and leadership skills to be able to work and
collaborate with partners, customers and engineering teams.
Well organized and able to manage multiple projects in a fast paced and demanding
environment.
Job ID: 135378721