On Tuesday, Alphabet Inc's Google provided new information about its supercomputers that are used to train its artificial intelligence models. Google claims that its systems are both faster and more power-efficient compared to comparable systems from Nvidia Corp.
Google uses its custom-designed Tensor Processing Unit (TPU) chips for over 90 per cent of its artificial intelligence training. The Google TPU is now in its fourth generation, and the company has published a scientific paper detailing how it has strung more than 4,000 of the chips together into a supercomputer using its custom-developed optical switches to connect individual machines.
Improving connections has become a crucial point of competition among companies that build AI supercomputers because the large language models that power technologies like Google's Bard or OpenAI's ChatGPT have exploded in size. The models are too large to store on a single chip, and they must be split across thousands of chips that must work together for weeks or more to train the model.
ALSO READ: Quantum Computing Touted To Be At Centre Of India's 'Techade'. Is It Truly The Future?
Google's supercomputers make it easy to reconfigure connections between chips on the fly, helping to avoid problems and tweak for performance gains. In a blog post about the system, Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson wrote that "circuit switching makes it easy to route around failed components," and "this flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model."
Google has been using its supercomputer inside the company since 2020 in a data centre in Mayes County, Oklahoma. According to Google, Midjourney used the system to train its model, which generates fresh images after being fed a few words of text.
Google said that its chips are up to 1.7 times faster and 1.9 times more power-efficient than a system based on Nvidia's A100 chip that was on the market at the same time as the fourth-generation TPU. However, Google did not compare its fourth-generation chip to Nvidia's current flagship H100 chip because the H100 came to the market after Google's chip and is made with newer technology.
Google hinted that it might be working on a new TPU that would compete with the Nvidia H100, but provided no details. Jouppi told Reuters that Google has "a healthy pipeline of future chips."