Tesla’s Dojo Supercomputer Violates All Established Industry Standards – CleanTechnica Deep Dive, Part 1
On AI Day this week, Tesla broke all rules and set industry standards for computer manufacturing. The presentation, just like the day of independence, was rather technical, and the people who did the presentation again may not have taken into account the fact that not everyone has a perfect command of the design and l microprocessor engineering. However, AI Day was all about getting geeks on and trying to hire industry experts, so it was probably an intentional choice.
In this in-depth dive, we take a look at and explain everything Tesla has said about computer hardware and compare it to the competition and the way things are normally done. Full Disclaimer: This presentation is still quite technical, but we try to explain everything in simple English. If you still have any questions, please leave them below in the comments and we’ll try to answer what we can.
To make this easier to digest, we also divide it into a series of 4 or 5 articles.
The Tesla GPU stack
In case it is not clear, Tesla has built – with NVIDIA GPUs – one of the most powerful supercomputers in the world. This is what they call the GPU stack and what they hope their programmers will want to disable and never use again as soon as Dojo is up and running. During the presentation, they declared that the number of GPUs is “more than the top 5 supercomputers in the world”. I had to dig it up, but what Tesla probably meant was that they have more GPUs than the 5th most powerful supercomputer in the world because it would be a supercomputer called Selene which has 4,480 NVIDIA V100 GPUs. However, if you add up the top 5, Tesla won’t beat the total GPU count – it’s not even close.
However, Tesla The GPU-based supercomputer, or at least its largest cluster, is quite possibly also the 5th most powerful supercomputer in the world. We can see that Tesla started receiving GPUs in mid-2019 for its first cluster. That date and the fact that they mentioned ray tracing during the presentation could mean that Tesla ordered NVIDIA Quadro RTX cards, although they may also be older NVIDIA V100 GPUs. Since NVIDIA released the A100 in November 2020, cluster number 2 is likely to consist of older hardware as well. If they were using V100 GPUs that would put the second cluster at around 22 PetaFLOPS, it would be right at the bottom of the top 10 list in 2020 and might not even have made the list and certainly not make the list. top 10 list. now.
Lucky for us, Andrej Karpathy revealed in a presentation he gave in June that the largest cluster is made up of NVIDIA’s new A100 GPUs. He said it’s pretty much the 5th most powerful supercomputer in the world. Considering the components, the theoretical maximum would equate to 112.32 PetaFLOPS, which puts it in 4th place, but given that working together there is always some scaling inefficiency means that the 5th place is most likely an accurate estimate, if we divide the performance of the FP16 TFLOP in half to estimate the performance of the FP32, you get around 90 PetaFLOP, a little less than the Sunway TaihuLight supercomputer in China.
The Dojo (where I would like to go)
So at first glance it might seem that with 1.1 Exaflop it would become the most powerful supercomputer in the world. However, Tesla has watered down the numbers a bit and Dojo will in fact become the 6th most powerful computer in the world. At present, the most powerful supercomputer is the “Fugaku” in Kobe, Japan, with a world record of 442 PetaFLOPS, three times faster than the second most powerful supercomputer, “Summit” in Tennessee, in the United States, which has 148.6 PetaFLOPS. Dojo, with its 68.75 PetaFLOPS (approximately), would then be in 6th place. In fact, because the next 3 supercomputers are quite close to 61.4-64.59 PetaFLOPS, it is possible that Dojo is in seventh, eighth, or even ninth place. Later in this series, we’ll explain this in more detail in the colorfully named section Tesla fails the FLOPS test.
Nevertheless, there is absolutely nothing to laugh about. In fact, when it comes to the specific tasks Tesla is creating Dojo for, it is very likely that Dojo will outperform all the other supercomputers in the world combined, and by far. The standard test for supercomputers is to peel apples, but Tesla has a yard full of oranges and designed a tool for that, so the simple fact that besides being the best in the world at peeling oranges, he is still able to get 6th place to peel apples shows how amazing this system is.
Stepping away from raw computing performance, Dojo and its jaw-dropping engineering put all supercomputers to shame in almost every other way imaginable. To explain this logically, you have to start at the beginning, the small scale.
What is an SoC
The way every computer currently operates is that you have one processor (CPU) – in some cases a corporate server can have two and the processor / s go on a motherboard that houses the RAM (8 temporary fast memory at 16GB in good laptops / desktops) and the computer has a power supply that supplies power to the motherboard to power everything. Most consumer desktops have a separate graphics card (GPU), but most consumer processors also have an integrated graphics card.
Now, if you haven’t read it yet, you might want to read my previous post in which I analyze Tesla’s Hardware Chip 3 (Elon Musk on Twitter called it a good analysis, so you can’t get it wrong), but to sum it up very quickly: the Tesla HW3 chip and most consumer processors are actually a SoC, a ‘system on a chip’, because they include cache memory (unfortunately only a few megabytes) as well as a processor and a graphics card and in the case of the Tesla HW3 chip, two NPUs or Neural Processing Units.
Wafer & Wafer Yield
Now, before we continue, I need to explain something about how processors, graphics cards, and SoCs are normally made. Not all components, like transistors, are added to individual processors. They are all placed while the processor is part of a circular disk called a wafer, which you can see in the image above. This wafer is then cut into pieces, each of which becomes an individual processor, GPU, or SoC. Chip manufacturing does not always go well and often some processors do not work or are only partially operational. In industry, the common term to describe this problem is low wafer yield.
Even most people who don’t know much about computer hardware know that Intel offers celeron / pentium, i3, i5, i7, and i9 processors, and that order goes from weakest to strongest. What most people don’t know is that due to wafer performance issues some of these processors are faulty, they only partially work so they deactivate the broken part of the chip and sell it as a cheaper version. is called binning. So a celeron / are broken ingena i3 and an i5 is a broken i7, and even in the chips there are different versions of an i5 and i7, some that can not reach the maximum clock speed are locked and sold as a cheaper variant of this chip. I don’t know if Intel is still doing this today with their latest chips, but they did it again as recently as 2017. The point is that instead of throwing a bad wafer or bad chips in a wafer , you can always get your return back. .
Read Part 2 here now: Tesla’s Dojo Supercomputer Breaks All Set Industry Standards – CleanTechnica Deep Dive Part 2.
Do you appreciate the originality of CleanTechnica? Consider becoming a CleanTechnica Member, Supporter, Technician or Ambassador – or Patron on Patreon.