Not Just Another Core
It’s hard to recall a more anticipated CPU release than the Core i7, the desktop CPU which will be available in a few weeks, based on the microarchitecture codenamed Nehalem that is being launched today. As soon as Intel began releasing bits of information last year and started showing off their architecture at conferences, performance exceeded expectations, and the buzz began.
And from there, the buzz grew, along with expectations. The desktop processor came to be known as Core i7 (with no explanation where the number 7 came from, but it’s easy to see why Intel kept the highly successful Core name), and we’re finally able to share full information and benchmarks with you – a full 2 weeks ahead of the official launch of the retail desktop CPU.
Sometimes Great Isn’t Good Enough
Intel’s previous microarchitecture, codenamed “Core”, stems from sheer necessity due to the failure of “Netburst” microarchitecture. After the debacle that was the Prescott core, Intel abandoned Netburst, and worked on developing a version of their Pentium-M microarchitecture (which is actually a hybridized, modernized version of Pentium 6 for mobile, since there was no way they could make Netburst work for mobile computing) for desktop and server applications. This came to be known as Core, and because Intel alternates between refreshing a microarchitecture, and inventing a new one every other year, turned into Penryn last year.
Core and Penryn had little trouble dominating anything AMD had to offer during those years. Just last week, we took a look at AMD’s fastest desktop processor – the Phenom X4 9950, and it could barely keep up with Intel’s bottom- and middle- range quad core processors. However, because of the way they are designed, there is at least one thing holding them back from their true potential – the Frontside Bus.
Although Intel made vast improvements going from Netburst to Core to Penryn, the Frontside Bus remained. Intel was able to sort of negotiate around it, by offering vast amounts of L2 cache on their higher-end models, and eventually had the FSB running at 1600 MHz. However, the limits of of the FSB were quickly being reached, and there was no way Intel could take Penryn much further without a huge change in architecture. That’s why Nehalem introduces an integrated memory controller, and an interface known as QuickPath for the CPU cores to communicate extremely efficiently.
If all this sounds familiar, you are correct – AMD saw the need to move to a similar design many years ago, and they have been using an integrated memory controller with the HyperTransport bus since 2003. This tells us that it isn’t the design of AMD’s processors that fail, but the implementation (at least on the desktop side).
Core and Penryn bore many processors; on the desktop there were about 10 different core codenames, including the “XE” variants. The mobile platform also saw 10 of its own, and Intel offered 11 variants intended for the server market. This sounds like a lot of different cores, but many of the same ones overlap between markets. For the most part, these cores are all very similar; the main differences are usually cache size, FSB speeds, and a few other things.
And that clearly explains one of the main philosophies of Nehalem – have a core that can be adapted and manipulated as Intel sees fit, depending on the target user of the processors. Nehalem will supply Intel with desktop, server, and mobile processors, and there will be more differences other than cache size and FSB speeds.
This is what a Bloomfield core quad-core Core i7 looks like under the hood. As you can see, it’s quite a bit different from previous Intel processors. Intel now has a native quad-core CPU for the desktop, with each core getting its own L1 and L2 cache. Below them is a shared 8MB L3 cache. For the first time, we see a memory controller on the die (in this case, it’s a triple-channel controller, supporting DDR3 only), as well as the QPI controller.
This die will look very different from a Nehalem die made for servers – for instance, the die above has a single QPI link, which runs at 6.4 GT/s. A server CPU that is meant for multi-core systems may have four QPI links, allowing each CPU to ‘talk’ to each other directly. It may also have more cores, a larger L3 cache, and more channels on its memory controller, and a different memory controller that supports FB-DIMMs.
A more mainstream core may have fewer cores, less L3 cache, fewer memory channels, fewer QPI links (and even no QPI at all, as it could use a DMI on the PCI-E bus instead). Eventually, we’ll even be seeing integrated graphics controllers, right there on the CPU die.
Basically, Intel can build the exact processor they want, building a die from various components suited for different tasks. The overall architecture of the CPU cores will remain the same, so let’s take a deeper look at the Nehalem CPU itself.