Wednesday, February 14, 2024

super LLM with only 3000 (AMD) GPUs

 Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through one trillion parameter model with only 8 percent of its MI250X GPUs | Tom's Hardware

Researchers at Oak Ridge National Laboratory trained a large language model (LLM) the size of ChatGPT on the Frontier supercomputer and only needed 3,072 of its 37,888 GPUs to do it. The team published a research paper that details how it pulled off the feat and the challenges it faced along the way.

The Frontier supercomputer is equipped with 9,472 
(AMD) Epyc 7A53 CPUs and 37,888 (AMD) Radeon Instinct 37,888 GPUs. However, the team only used 3,072 GPUs to train an LLM with one trillion parameters and 1,024 to train another LLM with 175 billion parameters.

doubled in last 1 year

more than tripled in last 1 year

No comments: