Job Description
Generative AI Platform Support Engineer Responsibilities:
- Provide technical support for our AI platform focusing on the integration of cloud infrastructure deployment and ongoing maintenance.
- Work closely with cross-functional teams to troubleshoot technical issues, implement platform enhancements, monitor system performance, and ensure the platform runs efficiently and effectively.
- Leverage expertise in AWS Cloud Administration and Infrastructure management to support platform operations and ensure optimal system performance.
Key Responsibilities:
- Assess and enhance the AI platform's cloud infrastructure and data pipeline resilience using AWS and cloud-based technologies.
- Ensure scalability and fault tolerance of AI/ML models within cloud environments.
- Identify and resolve bottlenecks in model inference and training pipelines, focusing on performance and resource optimization.
- Optimize cloud resource utilization on AWS for real-time use cases, including AI model deployment.
- Collaborate with the DevOps team on improving cloud deployment processes and managing AWS infrastructure.
- Implement automated testing to simulate fault tolerance and ensure high availability.
- Provide ongoing technical support for users of the Generative AI platform, troubleshooting issues and responding to queries to ensure seamless operations.
- Monitor cloud platform performance on AWS, identifying and implementing optimization strategies to improve cost efficiency and scalability.
- Work with AWS cloud services (e.g., EC2, S3, Lambda, VPC) to ensure proper configuration management and performance.
- Document key processes, issues, and solutions for knowledge sharing and future reference.
- Stay updated with industry trends in Generative AI, cloud technologies, and AWS cloud administration.