Description:
Position SummaryWe are looking for an AI Site Reliability Engineer to manage, optimize, and scale high-performance compute (HPC) and AI platforms including NVIDIA DGX and Cisco UCS. This role blends SRE principles, AI/ML operationalization, and infrastructure automation for mission-critical environments. ResponsibilitiesManage & scale HPC platforms (NVIDIA DGX / Cisco UCS) for AI workloads. Ensure availability, latency, scalability, and efficiency across systems. Drive capacity planning, perform
Aug 12, 2025;
from:
dice.com