Job Description
Introduction
IBM Infrastructure is a catalyst that makes the world work better because our clients demand it. Heterogeneous environments, the explosion of data, digital automation, and cybersecurity threats require hybrid cloud infrastructure that only IBM can provide. Your ability to be creative, a forward-thinker and to focus on innovation that matters, is all supported by our growth-minded culture as we continue to drive career development across our teams. Collaboration is key to IBM Infrastructure success, as we bring together different business units and teams that balance their priorities in a way that best serves our clients needs. IBM's product and technology landscape includes Research, Software, and Infrastructure. Entering this domain positions you at the heart of IBM, where growth and innovation thrive.
The Mission Of The IBM Storage Solutions Team Is To Engage With Strategic ISV And OEM Partners That May Advance The IBM Storage Strategy To Expand The Marketability Of The Portfolio, Create Short- And Long-term Alignment Of Solutions Roadmaps, And Engage In Collaborative GTM. The Team Will
Engage with the entire IBM Storage brand product managers, engineers, technologists, researchers, marketers, and sellers - and the same with our technology partners worldwide
Prioritize and lead strategic client engagements with solutions engineering expertise
Design, build, and operate a world-class strategically-aligned ISV program
Our primary intention is to realize measurable, material, incremental business for the Storage product team accredited to our joint in-market solutions with technology partners.
This group is comprised of technology and business professionals who are intent on bringing to market impactful customer solutions based on the IBM Storage assets of today and the future. It will have influence over our product roadmaps and will evangelize everything we have to offer to our partners, our clients, and the worldwide market broadly.
Your Role And Responsibilities
Storage Solutions Engineer - AI Infrastructure is a specialized role focused on architecting and deploying high-performance data environments for large-scale AI training and inference. It requires one to design, build, and maintain high-performance storage systems for AI/ML workloads, bridging storage tech with AI needs (GPU clusters, data pipelines), requiring skills in infrastructure automation, performance tuning , cloud platforms, and collaboration with data scientists to ensure scalable, secure, and reliable data flow for complex models.
Core Responsibilities
Architecture & Design: Lead end-to-end system design for distributed storage platforms tailored to AI/HPC workloads.
Performance Optimization: Maximize IOPS and throughput for multi-node GPU clusters, ensuring storage systems can keep up with the demands of deep learning frameworks
Infrastructure Automation: Build CI/CD and automation pipelines for provisioning and monitoring AI infrastructure using tools like Terraform, Ansible, and Kubernetes.
Solution Selection, Validation and Publishing : Evaluate and select next-generation storage technologies such as NVMe-oF (IBM Flashsystems), Ceph to support petabyte-scale data.
AIOps Integration: Build CI/CD pipelines, monitoring and orchestration (Kubernetes) for AI/ML workflows.
Pre-Sales & Strategy: Collaborate with customers, partners and field teams to translate customer business requirements into technical Bill of Materials (BOMs) and reference architectures.
Collaboration: Work with data scientists, ML engineers, and cloud teams to gather requirements and deliver solutions.
Troubleshooting: Resolve complex issues in distributed storage and AI environments, ensuring high availability.
Security & Compliance: Implement security best practices, access controls, and data encryption
Preferred Education
Master's Degree
Required Technical And Professional Expertise
Strong Expertise with NVIDIA DGX/HGX systems, GPUs, and DPUs (e.g., NVIDIA BlueField) for storage-offloading
Understanding and hands on experience of working with SAN, NAS, and Parallel File Systems (IBM Storage preferred) alongside protocols like NFS, SMB, and S3.
Proficiency in Linux, Networking and Storage Systems.
Sound understanding of data modelling, governance, and security frameworks.
Experience deploying hybrid or multi-cloud AI solutions on HCI, AWS, Azure, or GCP and virtualization technologies such as Kubernetes, RHOS, VMWare.
Automation/Scripting: Proficiency in Python, Bash, or Go for developing custom monitoring and management tools.
Excellent problem-solving and communication skills.
Ability to apply AIdriven tools for rapid problemsolving, databased decisions, and productivity improvement.
Experience: Typically 8-10+ years in systems engineering and/or storage architecture.
Preferred Technical And Professional Experience
Understanding of database-centric application architectures (such as SAP HANA, Oracle, MongoDB).
Experience in HA/DR design and implementations.