Intel SilverThrone - A New Generation of Processors
In a matter of months Intel will be officially releasing Silverthorne is the processor, it's Merom-x86 based (no SSE4). Poulsbo is the chipset. The combination of the two is referred to as the Menlow platform, what may end up as its furthest reaching microprocessor architecture of the next decade, yet hardly anyone is talking about it and it's rarely characterized properly.
I've often referred to Silverthorne as the processor Apple wanted to use in the iPhone but couldn't. In spirit there's truth in that statement, but practically it couldn't happen. Silverthorne won't be able to fit in something the size of an iPhone, it's not cool enough, it's not integrated enough and it's just not ready for that market. Intel believes it will be ready in about 3 years, I tend to agree.
But Silverthorne is launching this year, soon in fact, and there are still many unanswered questions. We know from CES about the types of devices we'll see it used in. Intel calls these Mobile Internet Devices, or MIDs, they are small devices that can be used to browse the web, check email, use chat clients, play music, view photos, etc... These MIDs will either run Vista or Linux, the majority being Linux due to lower system requirements and cost.
Who is Silverthorne?
In order to eventually compete in the ARM-space, Silverthorne has to be small and very cheap. The CPU itself is incredibly small thanks to its paltry 47M transistor count contributing to a die that's only 25 mm^2. Intel kept Silverthorne's die size small by greatly simplifying its architecture.
SilverThrone Wafer - Image taken from Intel Website
Silverthorne features a 16-stage in-order pipeline, meaning that it can only execute instructions in actual program order. This is different from Core 2, which can dynamically rearrange instructions in order to extract maximum parallelism from the code being worked on. By going with an in-order core, Intel reduces the complexity of the chip's design as well as reducing thermal output. Intel must balance the lack of instruction level parallelism that Silverthorne can extract with clever hardware and software tricks.
Remember Hyper Threading? Silverthorne will be the first Intel CPU since the Pentium 4 to allow the execution of two simultaneous threads on a single core. Silverthorne itself is too power hungry to be used in a dual-core design, and enabling two threads of execution yields a very efficient performance boost. Since the CPU itself can't reorder instructions to help fill all available execution units, by allowing another thread of instructions to be dispatched simultaneously Intel increases the chances of fully utilizing all of Silverthorne's execution resources. Intel also trimmed the number of individual execution units on Silverthorne: SIMD and scalar integer multiplies are handled by the same unit, as are FP and integer divides. Silverthorne can only issue two instructions per cycle, with the average IPC for most x86 applications at around 1 instruction the two thread design makes a lot of sense. Thankfully Silverthorne is being released into a world that is far more multithreaded than it was just a few years ago, otherwise such a design would hardly be sensible.
On the software side, Intel does do a lot in the compiler world which can come in handy, especially with Silverthorne. In-order processors shift the complexity of scheduling from the CPU to the compiler, meaning that while Silverthorne can run all x86 binaries, applications should be compiled for it in order to attain optimal performance.
Silverthorne in many ways is similarly equipped to the old 90nm Pentium, but still falls short overall - performance shouldn't be better, but at least within the realm of competitive
The Memory Subsystem
Block Diagram of Silverthrone
Silverthorne's in-order architecture also means that it is susceptible to high memory latencies. If a dependent instruction's data isn't available in cache on an out-of-order processor, it can simply re-order instructions around the dependency. With an in-order core however, if an instruction needs data that isn't yet available, it can't continue executing other independent instructions further down the instruction queue until that data becomes available. Making matters worse, Silverthorne has no on-die memory controller - this is something it will get in 2009/2010 with the Moorestown platform.
Silverthorne lacks an integrated memory controller and graphics core simply because Intel's 45nm memory controller and graphics core designs weren't finalized in time for Silverthorne's launch. Moorestown, which will also be built on Intel's 45nm process, will add an on-die memory controller and on-die graphics as well.
Thankfully, Intel has outfitted Silverthorne with fairly large caches. The L1 cache is unusually asymmetric with a 32KB instruction and 24KB data cache, a decision made to optimize for performance, die size, and cost. The L2 cache is an 8-way 512KB design, very similar to what was used in the Core architecture.
While Silverthorne is built entirely on Intel's high-k/metal gate 45nm process, there is one major difference: SRAM cell size. Intel uses a 0.382 um^2 SRAM cell in Silverthorne compared to 0.346 um^2 in Core 2. Each SRAM cell is an 8 transistor design compared to 6 transistors in Core 2. The larger cell size increases the die size of Silverthorne but it draws less power and runs at a lower voltage.
FSB, Performance, Clock Speeds, and Transistors
Silverthorne is connected to the outside world by a quad-pumped FSB similar to what Intel uses in its other processors, with some significant power tweaks. The FSB can operate at either 533MHz or 400MHz depending on power state/performance demands.
Dual Mode FSB
Ever since the P6 Intel has used a Gunning Transistor-Logic (GTL) based FSB, while Silverthorne's FSB can work in either a GTL or CMOS based mode. In CMOS mode power consumption is reduced significantly by turning off on-die termination and operating at half the voltage of GTL mode. Unfortunately, Intel couldn't give us more details as to what tradeoffs are made in order to achieve lower power operation in CMOS mode.
Silverthorne's pipeline depth is a bit on the long side, especially considering that it is an in-order core (which generally have shorter pipelines than out-of-order designs). Even the Core 2 architecture features a shorter 14-stage design, so we suspect that Intel needed a longer pipe to reach Silverthorne's clock and power targets.
Intel has stated publicly that Silverthorne is going to offer performance competitive with the first Pentium M processors, from both a clock speed and application performance standpoint. We'll touch on the application performance side of that momentarily, but the clock speed claims are reasonable. Thanks to a fairly deep pipeline, a very simple in-order core, and a very clockable 45nm manufacturing process Intel should have no problems hitting clock speeds in the 1 - 2GHz range. Intel's ISSCC paper states that performance is similar to mainstream Ultra-Mobile PCs, meaning that we should expect these things to perform at the level of a low 1GHz Core Solo processor.
The decision to go in-order made a lot of sense to Intel. Intel took its 20W TDP mobile Core 2 processors, scaled them down to 3W, and discovered that the best it could manage was a paltry 1GHz clock speed. The power consumption needed to be lower and clock speeds/performance needed to be higher, and thus a new design using an in-order architecture was necessary. This is especially true once you start looking at average and idle power, both of which need to be in the 10s - 100s of mW; in-order is necessary given the performance targets. While eventually we may see an out-of-order derivative (much as desktop microprocessors were in-order until out-of-order became feasible), Intel stated that for the next 5 years we're looking at in-order.
Silverthorne is built far more modularly than Core 2 or any of Intel's previous mobile microprocessors; it is honestly built more like a GPU than a CPU. Only 9% of the chip uses custom logic, and the rest is built using standard Intel circuit libraries. The L2 cache, PLLs, data I/O, addressing I/O, and a few other elements are standard logic for two reasons: time to market and flexibility.
Intel has been working on Silverthorne for 4 years now, so time to market was clearly not an issue for this first design - but for subsequent incarnations, it is. A standard design that's very modular will allow Intel to quickly integrate custom logic on a demand basis for specific markets. Intel could conceivably have a slightly different version of Silverthorne, with quick turnaround, for CE markets or for embedded applications.
Lower Power than Centrino
With mobile Penryn Intel introduced a new power state it calls C6. In the C6 power state the CPU is in a virtual reset state, and core voltage is very close to zero. The core clock, all of the PLLs, and caches are completely turned off. All of the state data is saved in a 10.5KB storage area, similar to mobile Penryn (but smaller since there's not as much state to save). Upon exiting C6 the processor's previous state is restored from this memory, called the C6 array. It takes around 100us to get out of C6, but the power savings are more than worth the effort - it's a similar approach of power for performance that we saw in the design of the original Pentium M processor.
Clock gating (sending the clock signal through a logic gate that can disable it on the fly, thus shutting off whatever the clock connects to) is an obvious aspect of Silverthorne's design. All Intel processors use clock gating; Silverthorne simply uses it more aggressively - the clock going to every "power zone" is gated, something that isn't the case in mobile Core 2. Each logic cluster (205 total) in Silverthorne is referred to as a Functional Unit Block (FUB) and the entire chip uses what Intel calls a sea of FUB design. Each FUB is clock gated and can be disabled independently to optimize for power consumption. The cache in Silverthorne is in its own FUB, which apparently isn't the case in mobile Core 2.
21 pins Architecture
Silverthorne uses a split power plane; in its deepest sleep state (C6) the chip can shut off all but 21 pins which are driven by the 1.05V VRM. By having two separate power planes the chip can manage power on a more granular level. While it can't disable individual pins, it can disable large groups of them leaving only 21 active when things like the L2 cache and bus interface are powered down.
Intel mentioned that Silverthorne will remain in its C6 sleep state 90% of the time. However, that figure is slightly misleading because it's only possible to remain in C6 when the CPU is completely idle. The 90% figure comes from taking into account a mobile device sitting in your pocket doing nothing most of the time. When in use, Silverthorne won't be able to spend nearly that much time in C6.
Despite the implementation of a C6 power state, Silverthorne will still lose to ARM based processors in both active and idle power. The active power disadvantage will be erased over the coming years as the microarchitecture evolves (and smaller manufacturing processes are implemented), while the idle power requires more of a platform approach. As we reported in our first Menlow/Silverthorne article:
"The idle power reduction will come through highly integrated platforms, like what we're describing with Moorestown. By getting rid of the PCI bus and replacing it with Intel's own custom low-power interface, Intel hopes to get idle power under control. The idea is that I/O ports will only be woken up when needed (similar to how the data lines on the Centrino FSB function), and what will result are platforms with multiple days of battery life when playing back music."
The final aspect of unique power management in Silverthorne falls upon its gridless clock distribution. Every single element within a microprocessor operates on the same clock, and making sure the same clock signal arrives at the exact same time across the many-mm^2 die is a difficult task. In conventional microprocessors, clock distribution is handled using a complex tree - which ends up eating over 30% of total microprocessor power. In-order designs allow the easier use of gridless clock distribution, which is a flat distribution using multiple clocks at the same time (instead of a tree/gridded design that is hierarchical).
Intel is targeting 0.6W - 2.0W TDPs with Silverthorne - obviously depending on clock speeds. At 2.0GHz, running at 1.0V, Silverthorne runs at 90C and dissipates 2W. The CPU temperature alone should be an indication that this is too hot for an ultra small iPhone-like form factor, so expect to see devices that are closer to a PlayStation Portable in size and at lower clock speeds for most of them.
Final Words
With Itanium, Intel felt that it was being limited by the x86 ISA (Instruction Set Architecture) and set off to develop the perfect instruction set and CPU architecture for its target markets. That was a very different time; if you remember back to AMD's introduction of x86-64, one thing AMD made a point of saying was that ISA no longer mattered - the penalty for going x86 was so small in comparison to the total chip complexity that it was a far better move to maintain software compatibility than get slightly better efficiency but break x86 support. With Silverthorne Intel has come full circle and embraced the same ideal; the goal is to have x86 from top to bottom and the range now includes ultra mobile devices once reserved for ARM processors.
In every market segment Intel has entered with x86, it has managed to dominate based on two principles: manufacturing prowess and x86 compatibility.
The manufacturing advantages Intel has held over the years are evident. Intel's incredible investment in fab plants around the world requires that it keeps them running as close to full capacity as possible. The market for low cost, low power silicon like Silverthorne is huge and should give Intel the ability to continue to hold onto a manufacturing advantage in the future. Intel won the server market by being able to subsidize the cost of manufacturing server CPUs by making even more desktop chips that needed the same fabs. The market for ARM-based processors is even larger than Intel's desktop CPU demand, so Silverthorne has the ability to have dramatic effects on Intel's overall CPU business if demand grows as expected.
The x86 compatibility aspect of Silverthorne is huge, and once again we turn to the iPhone example. The iPhone is Apple's only non-x86 product in its lineup, a lineup that runs in some shape or form a version of OS X. Maintaining separate software stacks for the iPhone vs. the rest of the product line isn't ideal; it increases the overhead and cost of software development and debugging. The 32nm successor of Silverthorne will be the chip that Apple will be able to stick in its 4th generation iPhone, giving it a single x86 software stack across all products.
There's significant competition to Silverthorne, and I don't expect it to be very successful in its first incarnation. ARM-based devices are simply smaller and used in very different ways than where Silverthorne is going to find itself. At first Silverthorne is simply going to compete with Intel's own ultra low voltage processors used in UMPCs, but it will slowly enable a new class of devices.
Inevitably the comparison to VIA's recently announced Isaiah CPU will be made, but Silverthorne is really aimed at a different market. Isaiah is a higher performance out-of-order core, while Silverthorne is eventually designed to make its way into highly integrated CE devices. We expect VIA's latest creation to outperform Silverthorne, but we don't expect the two to actually compete in the same space.
Moorestown's higher level of integration will bring about smaller devices, but still nothing iPhone-class. In our opinion, it won't be until after 2010 that we will really see the Intel/ARM race heat up. If you're expecting Silverthorne to revolutionize the ultra mobile world the way Centrino did upon its launch, you will be disappointed. What we are seeing however is the beginnings of what could be revolutionary. Intel's vision is clear; it's just up to Moore's Law to make the technology small, cool and powerful enough.
References -
Images Taken from Intel Website - http://www.intel.com/pressroom/kits/events/idffall_2007/mobility_photos.htm
AnandTech Hardware Forum - http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3230&p=2
0 comments