What You ll Do:
- Lead the design and implementation of observability tools and dashboards that provide actionable insights into platform performance and health
- Leverage Generative AI models and fine tune them to enhance observability capabilities, such as anomaly detection, predictive analytics, and troubleshooting copilot
- Build and deploy well-managed core APIs and SDKs for observability of LLMs and proprietary Gen-AI Foundation Models including training, pre-training, fine-tuning and prompting.
- Stay abreast of the latest trends in Generative AI, platform observability, responsible AI, and drive the adoption of emerging technologies and methodologies
- Collaborate as part of a cross-functional Agile team to create and enhance software that enables state of the art, next generation gen-ai applications
- Bring research mindset, lead Proof of concept to showcase capabilities of large language models in the realm of observability and governance which enables practical production solutions for improving platform users productivity.
Basic Qualifications:
- Bachelor s or Master s degree in Computer Science, Engineering
- At least 7 years of experience in machine learning engineering, building data intensive solutions using distributed computing
- At least 5 years of hands-on experience with Generative AI models and their application in observability or related areas
- At least 8 years of experience programming with Python, Go, or Java
- At least 5 years of experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
- At least 5 years of experience productionizing, monitoring, and maintaining models
- At least 5 years of experience with cloud platforms like AWS, Azure, or GCP
- At least 7 years of experience in developing performant, resilient, and maintainable code.
Preferred Qualifications:
- Masters or doctoral degree in data science/computer science, electrical engineering, mathematics
- 8+ years of experience in machine learning, particularly in deploying and operationalizing ML models
- 8+ years of experience building and evaluating agentic solutions
- Familiarity with container orchestration tools like Kubernetes and Docker
- Knowledge of data governance and compliance, particularly in the context of machine learning and AI systems
- Prior experience in NVIDIA GPU Telemetry and experience in CUDA
- Contributed to open source ML software
- Authored/co-authored papers, patent on ML techniques, model, or proof of concept
- 2+ experience in developing applications using Generative AI i.e open source or commercial LLMs