Wednesday, February 14, 2024

super LLM with only 3000 (AMD) GPUs

 Frontier trained a ChatGPT-sized large language model with only 3,000 of its 37,888 Radeon GPUs — the world's fastest supercomputer blasts through one trillion parameter model with only 8 percent of its MI250X GPUs | Tom's Hardware

Researchers at Oak Ridge National Laboratory trained a large language model (LLM) the size of ChatGPT on the Frontier supercomputer and only needed 3,072 of its 37,888 GPUs to do it. The team published a research paper that details how it pulled off the feat and the challenges it faced along the way.

The Frontier supercomputer is equipped with 9,472 
(AMD) Epyc 7A53 CPUs and 37,888 (AMD) Radeon Instinct 37,888 GPUs. However, the team only used 3,072 GPUs to train an LLM with one trillion parameters and 1,024 to train another LLM with 175 billion parameters.

doubled in last 1 year

more than tripled in last 1 year

AI future: "national intelligence", "biology engineering"

excellent sales / marketing: Nvidia CEO/founder presenting use-cases for AI

A Conversation with the Jensen Huang of Nvidia: Who Will Shape the Future of AI? (Full Interview) - YouTube

  • “advice to countries is the necessity of owning their national intelligence and not to allow someone else to do it... The first thing that every country should do is build infrastructure

  • good education for future: "(digital) biology as an engineering discipline" (using AI)


interesting "family feud" Nvidia + AMD

Jensen Huang - Wikipedia NVIDIA CEO 

Relatives Lisa Su (first cousin) AMD CEO, main competitor of Nvidia!



OpenAI -= Andrej Karpathy

there may be more good AI training videos coming! Andrej Karpathy - YouTube

 Andrej Karpathy is leaving OpenAI again — but he says there was no drama | TechCrunch

"nothing "happened" and it’s not a result of any particular event, issue or drama"


served as the director of artificial intelligence and Autopilot Vision at Tesla. He formerly worked at OpenAI