Where

AI Site Reliability Engineer (Remote) - & GCEAD

Bridge Flair LLC
San Jose Full-day Temporary

Description:

Position SummaryWe are looking for an AI Site Reliability Engineer to manage, optimize, and scale high-performance compute (HPC) and AI platforms including NVIDIA DGX and Cisco UCS. This role blends SRE principles, AI/ML operationalization, and infrastructure automation for mission-critical environments. ResponsibilitiesManage & scale HPC platforms (NVIDIA DGX / Cisco UCS) for AI workloads. Ensure availability, latency, scalability, and efficiency across systems. Drive capacity planning, perform
Aug 12, 2025;   from: dice.com

Similar jobs

  • Fourways Consulting Services
  • San Jose
Description: Job Title: Site Reliability Engineer (SRE) Location: Research Triangle Park, NC / San Jose, CA Duration: 12 Month Contract Job Description: We are looking for an experienced SRE with the following qualifications: Strong experience working ...
10 days ago
  • SolutionIT, Inc.
  • San Jose
Description: Solution IT Inc. is looking for a Applied AI Architect, one of its clients in Remote (only in PST) Job Title: Applied AI Architect Required Skills Job Description: We are seeking an experienced AI Architect with 12 15 years of software ...
30 days ago
  • Unicorn Technologies LLC
  • San Jose
Description: Job title: Agentic AI Development Location: San Jose, CA Hybrid Day 1 onsite after that can be remote. Duration: Contract - Long term. Key Responsibilities Agentic AI Development 5+ years of experienceHands on development using agentic AI ...
3 days ago
  • Alphosoft Inc
  • San Jose
Description: Location - RTP OR San Jose,CA Job Description for Senior Software Engineer: Our team is seeking a software engineer with extensive experience in enterprise-level software development, to join a dynamic and agile team of talented engineers ...
4 days ago