As per the reports by Reuters, the New AI chips of Nvidia named Balckwell are faced with overheating issues when arranged in the servers.
Problem & Repercussions
This has led to potential delays in setting up new data centers and getting them running across. The Blackwell Graphics Processing Units (GPUs) overheat upon connecting them to server racks that have been designed to hold up to 72 chips, Reuters claimed citing sources familiar with the matter.
Nvidia’s Response
“Nvidia is working with leading cloud service providers as an integral part of our engineering team and process. The engineering iterations are normal and expected,” a company spokesperson said in a statement to Reuters.
Nvidia has asked its suppliers to change the design of the racks several times to resolve overheating problems, employees, customers & suppliers associated with the issue told Reuters.
Blackwell AI Chip
Blackwell-architecture GPUs pack 208 billion transistors and are manufactured using a custom-built TSMC 4NP process. All Blackwell products feature two reticle-limited dies connected by a 10 terabytes per second (TB/s) chip-to-chip interconnect in a unified single GPU. It can be 30 times faster at tasks like responding to chatbots.