Arm challenges Intel and AMD to lead laptops with Cortex-X2 processor


Arms seeks to challenge Intel and AMD’s leadership in the personal computer (PC) market with its latest Cortex-X2 processor core, designed to deliver the performance levels required by laptops.

The company said the X1 core, part of its Cortex-X processor family, is its latest flagship processor for high-end smartphones and laptops. The processor core offers 16% instruction-per-clock (IPC) gains over the previous Cortex-X1 core at the same process node and frequency, but with double the cache, Arm said. The company called the X2 “its most powerful processor yet” for consumer devices.

The X2 is the first in its family of client processors based on the latest Armv9 architecture that dramatically improves performance, energy efficiency and security. As part of a long-term strategy to remove the 32-bit instruction set from its mobile chips by 2023, Arm said the X2 only supports 64-bit. The server-grade Neoverse N2 kernel, which uses the same underlying Armv9 architecture, was introduced in April.

Arm said the X2 also supports its second-generation Scalable Vector Expansion (SVE 2) technology. The kernel contains 128-bit SIMD processing pipelines based on SVE 2, which allows it to double the performance for machine learning compared to the X1. The performance gains also apply to other workloads, including 5G. As part of the Armv9 architecture, the X2 supports INT-8 and BFloat-16 data formats to speed up AI tasks.

Arm’s latest product launch comes as the company looks to capitalize on the growing momentum of laptops with its chips inside. Apple last year replaced Intel processors in its Mac laptop line in favor of its in-house-designed M1 system on chip (SoC), which it also began shipping in its desktop Mac line. The move came after a decade of Apple’s use of its custom-designed A-series SoCs in the iPhone.

% {[ data-embed-type=”image” data-embed-id=”60c7d3b72a3ece34428b499b” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”Cpu Blog Image3″ data-embed-src=”″ data-embed-caption=”” ]}%

Apple has a world-class chip engineering department that has spent years building the M1’s processor cores from the ground up, giving it 16 compute cores made on TSMC’s 5nm node. According to Apple, the M1 – which is also used in the latest iPad Pro tablet – performs better than the Intel processors that have powered its Mac laptops and desktops for more than a decade and a half.

Apple does not license a preconceived Arm kernel to create processors at the heart of the M1 and A14 chips in its latest iPhone. Instead, it relies on a so-called “architecture license” to design its own kernel. Apple has been able to deploy chips in recent years that can compete with Intel and AMD for single-threaded performance in PCs, pushing the boundaries in a way that hasn’t been possible for vendors using pre-fabricated Arm cores. .

Arm is trying to plunder more market share in personal computers with the X2 processor, which he hopes will allow more suppliers to match or even surpass Apple’s achievements with its A-series chips and M.

% {[ data-embed-type=”image” data-embed-id=”60c7d3b754a18ed6408b4980″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”Cpu Blog Image13″ data-embed-src=”″ data-embed-caption=”” ]}%

Arm said the X2 core is for the world’s most advanced 5nm and smaller nodes from TSMC and Samsung. When combined with the right system-level components of the SoC, the X2 delivers up to 30% single-thread performance gains over chips used in the latest flagship Android smartphones, said Paul Williamson, senior vice president. and general manager of the client. business at Arm, in a blog.

The CPU upgrades will put more pressure on Intel in the area of ​​personal computers. Qualcomm has deployed a family of Arm-based PC chips as part of a long-term game to challenge Intel’s lead in laptops running the Windows operating system. Microsoft has placed Arm-based processors it co-developed with Qualcomm in its Surface Pro X laptop, including the “SQ1” which debuted in 2019 and the “SQ2” last year.

When introduced to complement the A78 last year, the X1 represented a whole new class of Arm processors based on a philosophy of performance at all costs. While the Cortex-A series of processors used in most of the world’s smartphones pursued Arm’s strategy of finding the best balance between performance and power in a tight area, the X1 sacrifices some of the area and energy efficiency to achieve faster speeds.

The X1 core was previously used by Qualcomm in its Snapdragon 888 chip for high-end smartphones and Samsung Electronics in the Exynos 2100 processor at the heart of its Galaxy S21 5G smartphone.

% {[ data-embed-type=”image” data-embed-id=”60c7d3b7e6175737408b49e4″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”Cpu Blog Image12″ data-embed-src=”″ data-embed-caption=”” ]}%

While not competing directly with Intel and AMD, Arm said the X2 kernel will allow its customers to create more advanced SoCs for smartphones and laptops. Last year, the company launched its Cortex-X Custom program, where its engineering department agrees to work closely with silicon partners such as Qualcomm to create a processor based on Cortex-X cores tailored to their needs. specific needs.

Most smartphones with Arm processors inside today have cores arranged in a big.LITTLE architecture. They contain groups of large, high-performance (but power-hungry) processor cores and smaller, less powerful (but more power-efficient) clans of cores to extend battery life. The operating system engages the right processor in the cluster to run user applications, balancing the need for computing power with long battery life.

Most smartphone chips currently in production use Arm’s Cortex-A78 core as the processor engine and the Cortex-A55 as “small” cores. But at the end of last month, Arm introduced the Cortex-X2 and A710 to replace the A78 and the Cortex-A510 to replace the A55. The CoreLink CI-700 Coherent Interconnect and CoreLink N1-700 on-chip network interconnect, also launched last month, tie them all together on the silicon chip.

The A710 and A510 are based on the Armv9 architecture, giving them the same security enhancements as its X2 processor core, including internal cryptography acceleration and Memory Marking Extensions (MTE).

% {[ data-embed-type=”image” data-embed-id=”60c7d3b72a3ece9e208b4664″ data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”Cpu Blog Image7″ data-embed-src=”″ data-embed-caption=”” ]}%

Akash Jani, Semiconductor Analyst at The Linley Group, mentionned that the Cortex-X2, Cortex-A710, and Cortex-A510 deliver “impressive double-digit performance gains” at the expense of additional power consumption and matrix area. He said Arm started providing the blueprints to guide customers by the end of last year, and he expects the processor-core-based chips to go into production in early 2022.

Even though its latest generation of processors, codenamed “Matterhorn,” deliver faster speeds and higher power efficiency, Arm is unable to achieve these gains without improving the interconnects that connect them on a silicon chip. . Using its latest dynamic shared unit (DSU) – the DSU-110 – and CoreLink interconnects, Arm customers can deploy different configurations of Armv9-A processors for different markets.

According to Arm, its interconnect suite allows its customers to build processors with up to eight Cortex-X2 cores, 1MB L2 cache, and 16MB L3 cache on a single silicon slab. This arrangement bridges the gap with other chips used in personal computers, promising up to 40% more single-thread performance compared to Intel’s 3.5GHz i5-1135G7 used in laptops. released in 2020.

The DSU-100 supports a wide range of different CPU cluster configurations for different end markets. Other possible combinations include four X2 cores and four A710s for laptops; a single X2, three A710s and four A510s for premium 5G smartphones; two A710s and six A510s for use in speakers and smart TVs; and four A510 cores for slapped chips on smartwatches and other wearable devices.

% {[ data-embed-type=”image” data-embed-id=”60cad838e6175778508b465d” data-embed-element=”span” data-embed-size=”640w” data-embed-alt=”Cpu Blog Image6″ data-embed-src=”″ data-embed-caption=”” ]}%

“The Cortex-X series is designed to maximize performance on single-threaded and ‘burst’ workloads,” said Aditya Bedi, director of product management at Arm. “The microarchitecture pipeline is structured and provisioned to drive improvements in IPC performance. He added, “The Cortex-A700 series is a priority for sustained processor workloads, with the best balance of efficiency and performance. “

Arm has strengthened the branch prediction unit in the X2, one of the fundamental building blocks of modern processors. These functional blocks are used to predict the most likely outcome of a calculation in advance in order to speed up performance. The company has also decoupled the branch prediction unit from the instruction collector in the CPU so that it can run faster. This means that chips based on the X2 kernel are less likely to make incorrect assumptions, providing better performance over a wide range of workloads.

Arm also upgraded the instruction pipeline in the processor, allowing more instructions to be executed in a shorter time, resulting in better performance and power efficiency. Arm said they have reduced the number of clock cycles the processor needs to execute instructions from 11 to 10. The performance gains on a single instruction add up to the millions of operations the processor performs every second.

Additionally, Arm said it improved the X2’s prefetch technology, which loads instructions and other data into memory caches before they are executed. The company said it has also extended the out-of-order execution window, allowing the processor to execute instructions as soon as they are ready in order to reduce blockages that undermine performance. The reorganization buffer in the CPU has also been increased by 30%.

The X2 core can support 512KB or 1MB L2 memory cache depending on specific vendor requirements. Arm said it can be scaled to clusters of eight processor cores and up to 16MB of L3 cache.

Leave A Reply

Your email address will not be published.