CIO Insider

CIOInsider India Magazine

Separator

Nvidia's New Blackwell AI Chips Face Issue With Overheating

CIO Insider Team | Monday, 18 November, 2024
Separator

According to reports, Nvidia's new Blackwell AI chips, which have already faced delays, have encountered problems with accompanying servers that overheat, causing some customers to worry they will not have enough time to get new data centers up and running.

According to the report, Blackwell's graphics processing units overheat when connected together to a server rack designed to hold up to 72 chips.

The chipmaker has asked its suppliers to change the design of the rack several times to solve the overheating problem, according to Nvidia employees who have been working on the problem, as well as customers and suppliers who know the problem, as per reports.

Nvidia works with leading cloud service providers as an integral part of their engineering teams and processes. Engineering iterations are normal and expected.

In May, Nvidia announced the Blackwell chip and had previously said it would ship in the second quarter before encountering delays, potentially affecting customers such as meta-platforms, Alphabet's Google and Microsoft.

Nvidia's Blackwell chip takes two squares of silicon the size of the company's previous products and binds them to a single component that is 30 times faster in tasks like providing responses from chatbots.

According to the report, Blackwell's graphics processing units overheat when connected together to a server rack designed to hold up to 72 chips.

NVIDIA also submitted large-scale results on the GPT-3 175B benchmark using 11,616 Hopper GPUs connected with NVIDIA NVLink and NVSwitch high-bandwidth GPU-to-GPU communication and NVIDIA Quantum-2 InfiniBand networking.

NVIDIA Hopper GPUs have more than tripled scale and performance on the GPT-3 175B benchmark since last year. In addition, on the Llama 2 70B LoRA fine-tuning benchmark, NVIDIA increased performance by 26 percent using the same number of Hopper GPUs, reflecting continued software enhancements.

NVIDIA’s ongoing work on optimizing its accelerated computing platforms enables continued improvements in MLPerf test results — driving performance up in containerized software, bringing more powerful computing to partners and customers on existing platforms and delivering more return on their platform investment.



Current Issue
Education In Technology ERA



🍪 Do you like Cookies?

We use cookies to ensure you get the best experience on our website. Read more...