Hiring Pytorch Automation specialist, with expertise in Pytorch workloads and triton.
- Develop PyTorch workloads that stress model‑level execution (e.g., large GEMMs, attention patterns, MoE‑like behavior, mixed precision, long‑running loops)
- Author custom Triton kernels to directly stress hardware execution units, memory hierarchies, and synchronization paths
- Build parameterized stress harnesses that can scale with problem size, number of devices, and runtime duration
- Integrate workloads with existing tooling for profiling, monitoring, and failure triage
- Collaborate with platform, firmware, and SDK teams to ensure workloads target known risk areas and emerging issues
- Document usage patterns and provide reproducible scripts for lab and CI usage
Expected Deliverables
- A library of reusable PyTorch stress workloads
- A set of Triton‑based micro‑ and macro‑kernels designed specifically for stress and saturation testing
- Test harnesses/scripts supporting single‑device and multi‑device execution
- Documentation describing workload intent, configuration options, and expected stress characteristics