Tesla’s Dojo Supercomputer Violates All Established Industry Standards – CleanTechnica Deep Dive, Part 3
If you missed the early parts of this series, read first: Tesla’s Dojo Supercomputer Breaks All Established Industry Standards – CleanTechnica Deep Dive Part 1 and Tesla’s Dojo Supercomputer Breaks All Established Industry Standards industry – CleanTechnica Deep Dive, part 2.
Network SoCs together
Now normally each SoC sends signals through pins to the motherboard which are then redistributed. Tesla does not remove the SoCs from the wafer and instead connects all SoCs on the wafer with 72 network nodes for a total of 16TB / s or 4TB / s per edge that can connect it to a neighboring SoC. This means that each network node on the chip is capable of 222 Gb / s. During the presentation, Tesla said it is twice as fast as today’s leading edge network switching chips. At first I was skeptical of this claim, but after doing some research they are theoretically correct and the big network chip companies like Broadcom and Cisco have only been able to achieve speeds of 25.6 terabits per second per second. chip, which when converted is equivalent to 3.2 terabytes. .
I see why Tesla was surprised that the Gold Standard didn’t look exactly that great compared to what they were able to do, especially since networking isn’t the primary focus of this chip. , while for networking chips it is. Tesla’s D1 chip and training tile keep impressing with every turn.
Networking training tiles together
Now for the next unit of measurement, here is some good basic information. A traditional hard drive with spinning disks inside that everyone has and can easily reach several terabytes inside is unfortunately a bit slow, it has a read / write speed of between 50 and 150MB / s Also, it is important to keep in mind that we are now talking about sequential speeds like file transfer and not random speeds related to RAM. Then a solid state drive or regular SSD that uses NAND flash memory and is connected through a standard SATA port will have a speed between 200 and 500MB / s. Newer NVMe SSDs connected through an M.2 slot can achieve speed of 8 Gb / s, and the latest SSDs using the new PCI-e Gen 4 connection have a theoretical limit of 64 Gb / s. fastest product available on the market only has a speed of 15 Gb / s. Next, speaking of PCI-e Gen 4, Tesla also uses it to connect his training tiles (or wafers). But with 40 connectors and 32TB / s of bandwidth, that means each connector enjoys a speed of 900Gb / s, but how is that possible when I just said 64Gb / s is the limit? for PCI-e Gen 4?
Well, that is only true for the largest connection available to consumers, which is the PCI-e Gen 4 x16 slot. Here in the image above you can see the difference between the connectors. Now, as Tesla announced, they’ve made their own custom connectors, and that’s how each connector gets 900 Gb / s speed. This essentially makes their connector, which is relatively compact all things considered, 14 times faster than the best connector a regular motherboard can offer.
The specifications of the Tesla D1 chip
The D1 chip under its spec boasts that it has 50 billion transistors. As far as processors go, this absolutely beats the current record held by AMD’s Epyc Rome chip which has 39.54 billion transistors. However, among graphics cards, NVIDIA’s GA100 Ampere SoC still leads the way with 54 billion transistors. Now, the fact that a 7nm process was used to make the chip tells us that Tesla used Samsung or TSMC to make it happen. Personally, I think Samsung is more likely since it was also Samsung that made Tesla’s HW3 chip.
This paragraph is a little tangent, but in response to someone who added the D1 chip to a Moore’s Law chart, Elon responded on Twitter saying he was “pretty wild.” I just want to make something clear – this graph is very misleading.
First of all, the data it contains is completely selected to fit this row. There are all kinds of chips with different numbers of transistors at different points. As mentioned earlier, NVIDIA has an SoC with 4 billion more transistors than Tesla’s D1 chip. Then try to compare top-tier supercomputers with regular desktops or something with vacuum tubes, it’s just apples with oranges. The only reason the graph even forms this line is because it used a logarithmic scale, handpicked data, and even then most of it is unlabeled which confused the truth.
Try putting all of the same-price Intel chips on a chart (or at least their top tier) and see how Moore’s Law breaks down. Moore’s Law was true at first, but as we continue to die shrinking to a lower number of nm and begin to approach the point where Heisenberg’s Uncertainty Principle makes it difficult to ensure that an electron stays in the transistor, progress has slowed down considerably and does follow this trend line.
Cooling and power
So it wasn’t completely clear until later in the Q&A. However, I already suspected it from the start. The entire drive tile is liquid cooled. Interestingly enough, they didn’t say water cooled, so I wonder what liquid they are using. Nonetheless, the real revelation here is how well they are able to cool that silicon wafer. Tesla has a lot of experience with power electronics and cooling, and they put that expertise to good use here.
Normally a processor has on one side a piece of motherboard grade silicon with pins that conduct signals into the motherboard which is obviously impossible to cool. On the other side is the SoC which is covered with thermal paste (usually not very good thermal grease either) and then a metal heat sink which makes a processor look like a metal processor you have. maybe already seen. Next, a manufacturer, computer repairer, or PC enthusiast puts more thermal grease on the heat sink and then connects the smooth metal of the cooling block above the heat sink which then redirects the absorbed heat directly into the heat sink. a heat sink with a fan or in a liquid (usually water) which then transports the heat to a larger heat sink further away from the processor to which you can connect multiple fans.
In the case of the Tesla drive tile, one side of the wafer with all the SoCs is as exposed as on a regular processor (even more exposed since there is no heat sink) and can be cooled directly . The other side has voltage regulators covering each SoC. There are therefore two innovations here. First of all, the voltage regulator is usually located on the motherboard right next to the processor, which means that the current has to flow through the motherboard, socket, pins and the motherboard grade silicon that is on it. the SoC. However, this is not all. Much bigger innovation is also the last step in making it all at the top possible. Usually the current reaches the SoC from all sides through pins. If you’ve ever seen an old basic chip with a lot of pins on all sides, it’s basically like that but obviously much more advanced with a lot more pins. In this case, the power moves directly to the SoC. It’s not clear exactly how they managed to do this, but it’s pretty impressive, and depending on how it’s been done, it could also cause less heat if voltage can be introduced at multiple points on the chip so that the current does not have to travel so far. For the heat that all of these voltage regulators emit, a cooling block with holes for the connectors is all around the voltage regulators to remove as much heat from that side as well. Like I said the cooling block has holes and a single power supply powers all the voltage regulators simultaneously, it plugs right at the top and just above is yet another cooling block for cooling the power supply, even if it looks strangely like a radiator.
Tesla fails FLOPS test
Now that we’ve gone through all the details, we’re finally in a position to really compare Dojo to the competition.
A single practice tile has 9 PetaFLOPS of computing power. Now, I kind of skipped what even a PetaFLOP is since you dear reader weren’t in the rabbit hole yet, but now that you are, a PetaFLOP is made up of two parts – Peta, which is the number that comes after Terra, Giga, Mega and Kilo; so FIASCO stands for floating point operations per second. This is different from a PT, also known as Terra Operation Per Second – these are used to calculate INT8, INT16 and INT32 and we can forget about them now (although I should mention that NVIDIA unfortunately sometimes only releases performance in TOPS rather than FLOPS). I’m not going to try to be technical and explain what these performance values represent, I’m just going to make sure that you don’t accidentally end up comparing apples to oranges like Tesla sort of did. You see, when someone gives you a FLOP number, you have to make sure it’s FP64, FP32, or FP16 because each is twice as hard as the next. However, since Dojo only supports FP32 and the hybrid version of FP32 and FP16, which Tesla called BFP16, I initially assumed that the ExaFLOP 1.1 represented FP32 performance. This would have been great news as the most common test a supercomputer undergoes is the HPL-AI test which gives us a bunch of PetaFLOP FP32 scores that we can compare that to. However, on closer inspection, Tesla’s ExaFLOP 1.1 figure was for BF16 / CFP8 and not FP32. Thank god on a slide they gave the FP32 performance of a single SoC, which is 22.6 TeraFLOPS, and it just happens to be right next to the BF16 / CFP8 score, which is 362 TeraFLOPS. .
Now, performance per SoC and performance as a whole does not always evolve in the same way, nor is each of these tasks exactly equal. The math we’ve done here is pretty straightforward, though – if you divide the supercomputer score by the score of an SoC and multiply it by the FP32 score, you get 68.674 PetaFLOPS. In reality, this number could be a little more or less. However, as I quickly mentioned in the introduction, with 5th and 6th place being so close to this number, it is possible that Dojo is somewhere between the 4th and 7th most powerful supercomputer in the world. However, my bet would be 4th place.
Stay tuned for the last part of this series, released shortly….
Do you appreciate the originality of CleanTechnica? Consider becoming a CleanTechnica Member, Supporter, Technician or Ambassador – or Patreon Patron.