What is AVX-512 and why is Intel killing it?

Your device’s processor performs millions of calculations every second and is responsible for the operation of your computer. Working with the processor is the arithmetic processing unit (ALU), which is responsible for mathematical tasks and is driven by processor microcode.

Now this CPU microcode is not static and can be improved, and one of those improvements was Intel’s AVX-512 instruction set. However, Intel is about to kill AVX-512, permanently removing its features from its processors. But why? Why Intel is killing the AVX-512?


How does an ALU work?

Before becoming familiar with the AVX-512 instruction set, it is essential to understand how an ALU works.

As the name suggests, the arithmetic processing unit is used to perform mathematical tasks. These tasks include operations such as addition, multiplication, and floating point calculations. To accomplish these tasks, the ALU uses application-specific digital circuits, which are driven by the CPU clock signal.

Therefore, the clock speed of a CPU defines the speed at which instructions are processed in the ALU. So, if your CPU is running on a 5 GHz clock frequency, the ALU can process 5 billion instructions in one second. Because of this, processor performance improves as clock speed increases.


That said, as the clock speed of the processor increases, the amount of heat generated by the processor increases. For this reason, power users use liquid nitrogen when overclocking their systems. Unfortunately, this temperature increase at high frequencies prevents CPU manufacturers from increasing the clock frequency beyond a certain threshold.

So how does a next-gen CPU deliver better performance over older iterations? Well, processor manufacturers use the concept of parallelism to improve performance. This parallelism can be achieved by using a multi-core architecture where several different processing cores are used to improve the computing power of the CPU.

Another way to improve performance is to use a SIMD instruction set. Simply put, a Single Instruction Multiple Data instruction allows the ALU to execute the same instruction on different data points. This type of parallelism improves the performance of a processor, and AVX-512 is a SIMD instruction used to increase the performance of a processor when performing specific tasks.

How does the data get to the ALU?

Now that we have a basic understanding of how an ALU works, we need to understand how data reaches the ALU.

To reach the ALU, data must pass through different storage systems. This data path is based on the memory hierarchy of a computer system. A brief overview of this hierarchy is given below:

  • Secondary memory: The secondary memory of a computing device consists of a permanent storage device. This device can store data permanently but is not as fast as the CPU. For this reason, the CPU cannot access data directly from the secondary storage system.
  • Primary memory: The main storage system consists of random access memory (RAM). This storage system is faster than the secondary storage system but cannot store data permanently. Therefore, when you open a file on your system, it moves from hard drive to RAM. That said, even RAM is not fast enough for the CPU.
  • Cache memory: Cache memory is built into the processor and is the fastest memory system on a computer. This memory system is divided into three parts, namely L1, L2 and L3 cache. All data to be processed by the ALU is transferred from the hard disk to RAM and then to cache memory. That said, the ALU cannot access data directly from the cache.
  • CPU registers: The CPU register of a computing device is very small in size, and depending on the architecture of the computer, these registers can hold either 32 or 64 bits of data. Once the data has moved into these registers, the ALU can access them and perform the task at hand.


What is AVX-512 and how does it work?

The AVX 512 instruction set is the second iteration of AVX and made its way to Intel processors in 2013. Short for Advanced Vector Extensions, the AVX instruction set was first introduced to the architecture Intel’s Xeon Phi (Knights Landing) and later on Intel’s server. processors in Skylake-X processors.

Additionally, the AVX-512 instruction set made its way to mainstream systems with the Cannon Lake architecture and was later supported by the Ice Lake and Tiger Lake architectures.

The primary purpose of this instruction set was to speed up tasks involving data compression, image processing, and cryptographic calculations. Offering twice the computing power compared to older iterations, the AVX-512 instruction set offers substantial performance gains.

So how did Intel double the performance of its CPUs using the AVX-512 architecture?

Well, as explained earlier, the ALU can only access data present in a CPU’s register. The Advanced Vector Extensions instruction set increases the size of these registers.

Due to this increase in size, the ALU can process multiple data points in a single instruction, thereby increasing system performance.

In terms of register size, the AVX-512 instruction set offers thirty-two 512-bit registers, double that of the old AVX instruction set.

Why is Intel ending the AVX-512?

As explained earlier, the AVX-512 instruction set offers several computational advantages. In fact, popular libraries such as TensorFlow use the instruction set to provide faster computations on processors that support the instruction set.

So why is Intel disabling AVX-512 on its recent Alder Lake processors?

Well, Alder Lake CPUs don’t look like the old ones made by Intel. While older systems used cores running on the same architecture, Alder Lake processors use two different cores. These cores in Alder Lake processors are known as P and E cores and are powered by different architectures.

While the P cores use the Golden Cove microarchitecture, the E cores use the Gracemont microarchitecture. This difference in architecture prevents the scheduler from working properly when particular instructions can run on one architecture but not on the other.

In the case of Alder Lake processors, the AVX-512 instruction set is an example of this, as the P cores have the hardware to process the instruction, but the E cores do not.


For this reason, Alder Lake processors do not support the AVX-512 instruction set.

That said, the AVX-512 instruction can run on some Alder Lake CPUs where Intel hasn’t physically fused them. To do the same, users must disable electronic cores during BIOS.

Is AVX-512 needed on consumer chipsets?

The AVX-512 instruction set increases the register size of a processor to improve its performance. This performance boost allows processors to perform calculations faster, allowing users to run video/audio compression algorithms at faster speeds.

That said, this performance improvement can only be seen when the instruction defined in a program is optimized to run on the AVX-512 instruction set.

For this reason, instruction set architectures like AVX-512 are more suitable for server workloads, and consumer chipsets can operate without complex instruction sets like AVX-512.

Comments are closed.