Description:
About the Team The Platform Systems team operates at the intersection of cutting-edge AI and distributed systems. We do the engineering and research required to train our flagship models on our largest custom built supercomputers. We build our own model training software, and focus on the lower layers of the stack including collective communication, scheduling, compute efficiency, parallelism strategies, fault tolerance, and observability. The models we train are key ingredients to the AI res
Apr 20, 2024;
from:
dice.com