Esperanto causes 1092 RISC-V processors to dance on the head of a pin, er a chip – EEJournal



Dave Ditzel has a legendary history with computers and microprocessors. He joined AT&T Bell Labs to work on C language development. There he developed several generations of processors designed to run optimized compiled C programs, including CRISP, the “C language reduced instruction set processor. From Bell Labs.

At Bell Labs, Ditzel also co-authored the foundational RISC document, “The Case for the Reduced Instruction Set Computer,” with Professor David Patterson of UC Berkeley. Ditzel then joined Sun Microsystems as CTO of SPARC Technology Business, where he led the development of the SPARC RISC processor architecture and the 64-bit SPARC ISA. Oracle bought Sun Microsystems in 2010 and then discontinued SPARC development in 2016. But thanks to SPARC International, SPARC ISA continues, as a fully open, non-proprietary, royalty-free IP.

Ditzel founded microprocessor maker Transmeta in 1995 with the intention of developing VLIW (very long instruction word) processors that fully emulated the x86 ISA using dynamic binary translation software. This software wrapper converts x86 instructions to VLIW machine instructions on the fly. Transmeta launched its first product, the Crusoe processor, with an elaborate press conference at the Villa Montalvo Arts Center in the hills above Tony Saratoga, Calif. On January 19, 2000. I was there as a member of the hurry.

Transmeta has positioned Crusoe as a low power alternative to Intel and AMD x86 offerings for thin and light laptops and other mobile devices. The company’s processor sales initially looked promising. It seemed like Transmeta had carved out a seemingly protected niche – low-power x86 processors for battery-powered devices – while AMD and Intel tangled on the high-performance, high-power x86 front.

But Crusoe’s sales and media coverage gave Intel a sense of the desperate need that the thin-and-light laptop and mobile markets wanted for a low-power x86 microprocessor. Intel then moved quickly to fill the Transmeta-sized void in its x86 processor product line with an “almost good enough” power cut from the Pentium III processor, dubbed the Pentium III-M. Intel then quickly followed the Pentium III-M processor with the Pentium M. The “M” stands for “mobile”, of course.

Transmeta’s Crusoe processor provided poor application performance. Transmeta therefore doubled the width of VLIW instructions from 128 to 256 bits for its second-generation Efficeon processor, which executed x86 code faster than Crusoe. However, at the same time, Transmeta changed foundry and subsequently encountered problems with delivery times. In 2007, the company changed its business model to IP licensing. Transmeta was acquired in 2009, and operations were closed later that same year. Exit Transmeta from the processor wars. Sic transit gloria mundi.

All this is just a prologue, proving that Dave Ditzel is no stranger to microprocessors, that he thinks big and that he thinks outside the box. Its latest company is Esperanto Technologies, and it still works well outside of the microprocessor box.

Ditzel presented details of Esperanto’s ET-SoC-1 ML (machine learning) inference chip at Hot Chips 33 in August. This chip fully displays Ditzel’s long, long association with RISC processors. The ET-SoC-1 ML inference engine incorporates 1092 (that’s “one thousand and ninety-two”) custom 64-bit RISC-V microprocessor cores, 160MB of on-chip SRAM, and I / O ports assorted, all on a single 7nm matrix.

“Ambitious” is a low key word for this smart design.

As the name suggests, the ML inference ET-SoC-1 chip is designed to deliver superior power / performance results when running ML inference workloads. It does this through economical circuit and logic design and specialized low-voltage power supply design techniques.

Of the 1092 RISC-V processor cores on the ET-SoC-1 ML chip, 1088 cores are Esperanto’s “ET Minion” cores, an obvious reference to the adorable, loyal, yellow-skinned worker bees in the animation. “Despicable Me” film franchise. ET Minion cores are based on the open 64-bit RISC-V ISA, with proprietary vector and tensor instruction extensions developed specifically for ML applications. These ISA extensions support single clock operations on floating point vectors and tensors as large as 256 bits using 16 or 32 bit operands, and 512 full data bits using 8 bit operands.

Figure 1 illustrates the layout of an ET Minion core. As shown in Figure 1, the vector / tensor unit eclipses its attached full 64-bit RISC-V controller core:

Fig 1: Esperanto’s ET Minion processor core augments a RISC-V processor with a tensor / vector unit that is more than twice the size of the entire RISC-V unit.

Normally, such a vector / tensor unit would be designed as a CPU hosted accelerator, not integrated directly into a microprocessor IP core. The Google TPU (Tensor Processing Unit) is an example of such an accelerator.

After Esperanto developed the ET Minion core, the design of the remaining ET-SoC-1 ML inference chip became a rehearsal exercise. (Yes, I just oversimplified, and you’re about to see why.) The first step-repeat operation groups eight of the ET Minion cores and 32KB of shared instruction cache into “Quarters” ”, Shown in Figure 2.

Fig 2: Eight ET Minion processor cores and 32KB of shared instruction cache include an ET-SoC-1 ML “Neighborhood” inference chip.

Four 8-core ET-Minion quarters are then connected to four on-chip 1MB SRAM banks via a 4 × 4 crosspoint switch to form a “Minion Shire”. (The Esperanto Naming Committee must have played around with these names.) A block diagram of Minion Shire is shown in Figure 3.

Fig 3: Four quarters with four 1MB SRAM banks and a NoC interface make up a Minion Shire on the ET-SoC-1 ML inference engine.

The Minion Shire’s four 1MB SRAM banks can be configured as cache or as notebook RAM, depending on the needs of the ML application. The four SRAM banks also connect to a “Mesh Stop”, which serves as a ramp from the Minion Shire to the internal NoC of the ET-SoC-1 ML (network on a chip) inference engine.

The complete ET-SoC-1 ML inference engine layout consists of 34 Minion Shires, with a few additional Shire types, including a Shire that incorporates four high-performance Esperanto Maxion processor cores (also based on the RISC processor core -V), eight Shires memories (each containing an LPDDR4 SDRAM controller) and a PCIe Shire. All Shires incorporate mesh stops and communicate with each other via the NoC. Figure 4 provides a schematic of the complete matrix layout of the ET-SoC-1 ML inference engine.

Fig 4: The complete ET-SoC-1 ML inference engine consists of 34 Minion Shires, one quad-core ET Maxion Shire processor, eight Memory Shires with LPDDR4 SDRAM controllers, and one PCIe Shire.

Esperanto designed the ET-SoC-1 ML inference engine with a power budget of 120W in mind. This budget is based on the amount of power available for an open source Glacier Point v2 board, which was designed for the Open Compute Project’s Yosemite V2 next-generation multi-node server platform. Esperanto has designed a Dual M.2 module that plugs into the Glacier Point v2 card. This Dual M.2 module combines an ET-SoC-1 ML inference engine with 24 LPDDR4 SDRAMs. The Glacier Point v2 card can accept up to six of these Dual M.2 modules, creating a pluggable server card with 6558 RISC-V cores (Minion and Maxion cores) and up to 192 GB of LPDDR4 SDRAM.

With six ET-SoC-1 ML inference engine chips and associated SDRAM, that 120W power budget equates to 20W per Dual M.2 card. Esperanto needed to hierarchically integrate low power features into the ET Minion cores, Quarters, Shires and NoC, to integrate over 6000 processor cores and the associated LPDDR4 SDRAM into the power envelope of 120 W from the Glacier Point V2 card.

Given the target architecture of the ET-SoC-1 ML massively parallel inference engine, Esperanto analyzed the relationships and dependencies between clock frequency, core supply voltage, and application performance. in order to find an optimal setpoint for the 1088 ET Minion processor cores on the inference engine matrix. The resulting power / performance / voltage curve is shown in Figure 5, which shows that the ET Minion processor cores must operate at around 0.4V to deliver maximum performance while meeting the 20W power target of the Glacier Point Dual M.2 card. (This number includes the operating power of LPDDR4 SDRAMs.)

Fig 5: Deep analysis showed that the ET-SoC-1 ML inference engine would hit its 20W power budget by running the 1088 ET Minion processor cores on-chip at around 0.4V.

During his presentation at Hot Chips 33, Ditzel pointed out that a core operating voltage of 0.4V is neither below the threshold nor even near the threshold for transistors made with the 7nm target process technology. He claimed that the working voltage of 0.4V is therefore quite reasonable.

Based on its analysis, Esperanto claims that the ET-SoC-1 ML inference engine achieves 123 times better performance per watt on the MLPerf Deep Learning Recommendation Model benchmark compared to an Intel Xeon Platinum 8380H processor and 25.7 times better performance per watt on the benchmark ResNet 50 compared to an Intel Xeon Platinum 9282. These particular Xeon processors were announced by Intel in Q2 2020 and Q2 2019, respectively. They don’t represent Intel’s latest Xeon silicon, so take out the salt shaker for those claims in Esperanto.

Ditzel’s benchmarks were all moot at the time of its Hot Chips 33 presentation because Esperanto had just received the first silicon for the ET-SoC-1 ML inference engine from the foundry and had not yet had the time to operate the chip. This is the subject of marketing benchmarks between simulated processors and real silicon. It will be extremely interesting to see if and how reality catches up with the Esperanto simulation numbers.

It will also be interesting to see if history repeats itself for Dave Ditzel. It once tweaked Intel’s nose with the low-power Transmeta Crusoe x86 processor. Intel has responded vigorously to a clear and present danger directed at its processor market and, as a result, Transmeta is no longer. The Esperanto ET-SoC-1 ML inference engine plays another low-power game in another segment of Intel’s market – the data center – which is also currently occupied by opposing teams from AMD, Nvidia, Xilinx , Google Cloud, Amazon Web Services, and several other ML contenders besides Intel.

It will be quite a battle. We will watch to see what happens.


Leave A Reply

Your email address will not be published.