Tesla has unveiled the latest version of a supercomputer called Dojo, designed for AI machine learning, specifically for video training using data from the automaker's electric vehicles.
Tesla already owns one of the most powerful NVIDIA GPU-based supercomputers, but Dojo is a full Tesla development — it uses Tesla chips and infrastructure.
The new supercomputer must extend the ability to train neural networks using video data, which is critical to the computer vision technology that is the foundation of Tesla's autonomous driving system.
During its AI Day 2022, the company confirmed that it has managed to go from chip and tile to system tray and full body.
According to the company, a single Dojo tile can replace six GPU boxes, which costs less than a single GPU unit. There are six of these tiles in each tray. The tray itself is equivalent to "3 to 4 fully-loaded supercomputer racks." Tesla can accommodate two such system trays in a single Dojo enclosure.
In terms of key specifications, the Dojo Exapod has 1.1 EFLOP, 1.3TB of SRAM and 13TB of high-bandwidth DRAM.
Bill Chang, Tesla’s Principal System Engineer for Dojo, said:
We knew that we had to reexamine every aspect of the data center infrastructure in order to support our unprecedented cooling and power density.
They had to develop their own powerful cooling and power system to power the Dojo cabinets. According to Chang, that earlier this year while testing the infrastructure, Tesla shut down a substation on the local power grid:
Earlier this year, we started load testing our power and cooling infrastructure and we were able to push it over 2 MW before we tripped our substation and got a call from the city.