Job Title: HPC Engineer
Location: Bengaluru, Karnataka, India
Roles and Responsibilities
- Design and implement systems tailored for high-performance computing environments to ensure optimal performance, scalability, and reliability at our on-site location in Bengaluru, Karnataka, India.
- Collaborate with cross-functional teams to understand computational needs and translate them into technical solutions, leveraging your expertise in systems design and software engineering.
- Manage, configure, and monitor HPC systems and clusters, ensuring their efficient operation and continual improvement in terms of computing power and resource utilization.
- Develop and maintain automation scripts for the administration and operation of HPC infrastructure, promoting efficiency and reducing manual workloads.
- Conduct performance tuning to enhance system efficiency, utilizing your deep understanding of operating systems and hardware architectures to identify and resolve bottlenecks.
- Provide technical support and guidance to users, ensuring they can fully exploit the functionalities of the HPC resources available, fostering a collaborative and user-centric environment.
Required Qualifications
- Proven experience in deploying and managing HPC systems, with a strong foundation in system architecture design and implementation.
- Advanced proficiency in programming languages suitable for computational tasks and system management, with the ability to write efficient and scalable code.
- Solid experience in Linux system administration, with a track record of managing large-scale server environments and ensuring their reliability and security.
- Demonstrated ability in performance analysis and optimization, with experience in identifying and addressing system and application-level bottlenecks.
- Excellent problem-solving skills with a methodical approach to diagnosing and resolving technical issues in complex computing environments.
- Strong communication skills, both written and verbal, with the ability to convey technical concepts to technical and non-technical stakeholders alike.
Key Responsibilities
- Work directly with end-users to understand their computational workflows and advise on the best approaches to achieve their objectives using HPC resources effectively.
- Continuously evaluate new and emerging technologies to keep the HPC infrastructure current and aligned with the latest industry best practices and innovations.
- Ensure high availability and stability of HPC infrastructure by implementing robust monitoring and alerting strategies for proactive system management.
- Assist in the optimization and parallelization of software applications, guiding users on adjusting their code for improved performance on HPC systems.
- Maintain comprehensive documentation of system configurations, processes, and procedures to support knowledge transfer and operational continuity.
- Participate in incident response and post-mortem analysis to reduce the likelihood and impact of future incidents and ensure prompt recovery of services.
Important: Prolegion does not charge any fee from candidates at any stage. If anyone asks for money in exchange for this opportunity, please treat it as a potential scam and report it to us immediately.