Key Responsibilities:
- Design and optimize model serving infrastructure with a focus on low latency and cost efficiency
- Build scalable inference pipelines across different hardware acceleration options
- Implement monitoring and observability solutions for ML systems
- Collaborate with ML Engineers to define best practices for deployment
- Develop enterprise-grade, cost-efficient ML solutions
- Work closely with MLEs, QA, and DevOps teams in a distributed environment
- Evaluate new technologies and contribute to system architecture decisions
- Drive continuous improvements in ML infrastructure
Required Experience & Skills:
- 5+ years of experience in software engineering using Python
- Hands-on experience with ML frameworks (especially PyTorch)
- Experience optimizing ML models using hardware accelerators (e.g., AWS Neuron, ONNX, TensorRT)
- Familiarity with AWS ML services and hardware-accelerated compute (e.g., SageMaker, Inferentia, Trainium)
- Proven ability to build and maintain serverless architectures on AWS
- Strong understanding of event-driven patterns (SQS/SNS) and caching strategies
- Proficiency with Docker and container orchestration tools
- Solid grasp of RESTful API design and implementation
- Focus on secure, high-quality code with experience using static code analysis tools
- Strong problem-solving, algorithmic thinking, and communication skills