Original Link: https://www.anandtech.com/show/6529/busting-the-x86-power-myth-indepth-clover-trail-power-analysis
The x86 Power Myth Busted: In-Depth Clover Trail Power Analysis
by Anand Lal Shimpi on December 24, 2012 5:00 PM ESTThe untold story of Intel's desktop (and notebook) CPU dominance after 2006 has nothing to do with novel new approaches to chip design or spending billions on keeping its army of fabs up to date. While both of those are critical components to the formula, its Intel's internal performance modeling team that plays a major role in providing targets for both the architects and fab engineers to hit. After losing face (and sales) to AMD's Athlon 64 in the early 2000s, Intel adopted a "no more surprises" policy. Intel would never again be caught off guard by a performance upset.
Over the past few years however the focus of meaningful performance has shifted. Just as important as absolute performance, is power consumption. Intel has been going through a slow waking up process over the past few years as it's been adapting to the new ultra mobile world. One of the first things to change however was the scope and focus of its internal performance modeling. User experience (quantified through high speed cameras mapping frame rates to user survey data) and power efficiency are now both incorporated into all architecture targets going forward. Building its next-generation CPU cores no longer means picking a SPECCPU performance target and working towards it, but delivering a certain user experience as well.
Intel's role in the industry has started to change. It worked very closely with Acer on bringing the W510, W700 and S7 to market. With Haswell, Intel will work even closer with its partners - going as far as to specify other, non-Intel components on the motherboard in pursuit of ultimate battery life. The pieces are beginning to fall into place, and if all goes according to Intel's plan we should start to see the fruits of its labor next year. The goal is to bring Core down to very low power levels, and to take Atom even lower. Don't underestimate the significance of Intel's 10W Ivy Bridge announcement. Although desktop and mobile Haswell will appear in mid to late Q2-2013, the exciting ultra mobile parts won't arrive until Q3. Intel's 10W Ivy Bridge will be responsible for at least bringing some more exciting form factors to market between now and then. While we're not exactly at Core-in-an-iPad level of integration, we are getting very close.
To kick off what is bound to be an exciting year, Intel made a couple of stops around the country showing off that even its existing architectures are quite power efficient. Intel carried around a pair of Windows tablets, wired up to measure power consumption at both the device and component level, to demonstrate what many of you will find obvious at this point: that Intel's 32nm Clover Trail is more power efficient than NVIDIA's Tegra 3.
We've demonstrated this in our battery life tests already. Samsung's ATIV Smart PC uses an Atom Z2760 and features a 30Wh battery with an 11.6-inch 1366x768 display. Microsoft's Surface RT uses NVIDIA's Tegra 3 powered by a 31Wh battery with a 10.6-inch, 1366x768 display. In our 2013 wireless web browsing battery life test we showed Samsung with a 17% battery life advantage, despite the 3% smaller battery. Our video playback battery life test showed a smaller advantage of 3%.
For us, the power advantage made a lot of sense. We've already proven that Intel's Atom core is faster than ARM's Cortex A9 (even four of them under Windows RT). Combine that with the fact that NVIDIA's Tegra 3 features four Cortex A9s on TSMC's 40nm G process and you get a recipe for worse battery life, all else being equal.
Intel's method of hammering this point home isn't all that unique in the industry. Rather than measuring power consumption at the application level, Intel chose to do so at the component level. This is commonly done by taking the device apart and either replacing the battery with an external power supply that you can measure, or by measuring current delivered by the battery itself. Clip the voltage input leads coming from the battery to the PCB, toss a resistor inline and measure voltage drop across the resistor to calculate power (good ol' Ohm's law).
Where Intel's power modeling gets a little more aggressive is what happens next. Measuring power at the battery gives you an idea of total platform power consumption including display, SoC, memory, network stack and everything else on the motherboard. This approach is useful for understanding how long a device will last on a single charge, but if you're a component vendor you typically care a little more about the specific power consumption of your competitors' components.
What follows is a good mixture of art and science. Intel's power engineers will take apart a competing device and probe whatever looks to be a power delivery or filtering circuit while running various workloads on the device itself. By correlating the type of workload to spikes in voltage in these circuits, you can figure out what components on a smartphone or tablet motherboard are likely responsible for delivering power to individual blocks of an SoC. Despite the high level of integration in modern mobile SoCs, the major players on the chip (e.g. CPU and GPU) tend to operate on their own independent voltage planes.
A basic LC filter
What usually happens is you'll find a standard LC filter (inductor + capacitor) supplying power to a block on the SoC. Once the right LC filter has been identified, all you need to do is lift the inductor, insert a very small resistor (2 - 20 mΩ) and measure the voltage drop across the resistor. With voltage and resistance values known, you can determine current and power. Using good external instruments you can plot power over time and now get a good idea of the power consumption of individual IP blocks within an SoC.
Basic LC filter modified with an inline resistor
Intel brought one of its best power engineers along with a couple of tablets and a National Instruments USB-6289 data acquisition box to demonstrate its findings. Intel brought along Microsoft's Surface RT using NVIDIA's Tegra 3, and Acer's W510 using Intel's own Atom Z2760 (Clover Trail). Both of these were retail samples running the latest software/drivers available as of 12/21/12. The Acer unit in particular featured the latest driver update from Acer (version 1.01, released on 12/18/12) which improves battery life on the tablet (remember me pointing out that the W510 seemed to have a problem that caused it to underperform in the battery life department compared to Samsung's ATIV Smart PC? it seems like this driver update fixes that problem).
I personally calibrated both displays to our usual 200 nits setting and ensured the software and configurations were as close to equal as possible. Both tablets were purchased by Intel, but I verified their performance against my own review samples and noticed no meaningful deviation. All tests and I've also attached diagrams of where Intel is measuring CPU and GPU power on the two tablets:
Microsoft Surface RT: The yellow block is where Intel measures GPU power, the orange block is where it measures CPU power
Acer's W510: The purple block is a resistor from Intel's reference design used for measuring power at the battery. Yellow and orange are inductors for GPU and CPU power delivery, respectively.
The complete setup is surprisingly mobile, even relying on a notebook to run SignalExpress for recording output from the NI data acquisition box:
Wiring up the tablets is a bit of a mess. Intel wired up far more than just CPU and GPU, depending on the device and what was easily exposed you could get power readings on the memory subsystem and things like NAND as well.
Intel only supplied the test setup, for everything you're about to see I picked and ran whatever I wanted, however I wanted. Comparing Clover Trail to Tegra 3 is nothing new, but the data I gathered is at least interesting to look at. We typically don't get to break out CPU and GPU power consumption in our tests, making this experiment a bit more illuminating.
Keep in mind that we are looking at power delivery on voltage rails that spike with CPU or GPU activity. It's not uncommon to run multiple things off of the same voltage rail. In particular, I'm not super confident in what's going on with Tegra 3's GPU rail although the CPU rails are likely fairly comparable. One last note: unlike under Android, NVIDIA doesn't use its 5th/companion core under Windows RT. Microsoft still doesn't support heterogeneous computing environments, so NVIDIA had to disable its companion core under Windows RT.
Idle Power
In all of these tests you're going to see three charts. The first will show you total platform power, measured at the battery, taking into account everything from SoC to display. The next shows you power measured at the CPU power delivery circuit, and the third shows you power measured at the GPU power delivery circuit. All values are measured in watts, and are reported in 15ms intervals (although I sampled at 1KHz then averaged down to 15ms).
For our first set of tests I simply wanted to get a feel for idle power. Both systems had all background syncing suspended, WiFi was connected, and we're just sitting at the Windows RT/8 Start Screen until the tablets reached a truly idle state. Note that idle under Windows RT/8 technically doesn't happen until the live tiles stop updating, which you'll see denoted by a drop in the idle power consumption in the graphs below.
First up is total platform power consumption:
Surface RT has higher idle power, around 28% on average, compared to Acer's W510. The last half of the graph shows the tablets hitting true idle when the live tiles stop animating.
A look at the CPU chart gives us some more granularity, with Tegra 3 ramping up to higher peak power consumption during all of the periods of activity. Here the Atom Z2760 cores average 36.4mW at idle compared to 70.2mW for Tegra 3.
The GPU specific data is pretty interesting - the GPU power rail shows much high power consumption than on Intel's Z2760. As I didn't design Tegra 3, I don't know what else is powered by this rail - although you'd assume that anything else not in use would be power gated. Imagination Technologies' PowerVR SGX 545 does appear to be quite power efficient here, on average using 155mW while rendering the Start Screen.
I wasn't happy with the peaks we were seeing when nothing was happening on the systems, so to confirm that nothing funny was going on I threw both tablets into airplane mode and waited for full idle. Check out the tail end of the platform power diagram:
That's much better. Without the AP talking to each tablet's WiFi radio constantly, idle becomes truly idle. If you're curious, the power savings are around 47.8mW (average) for the W510 in airplane mode when fully idle.
The GPU rail feeding the Atom Z2760 appears to hit a lower idle power when compared to NVIDIA's Tegra 3. Advantages in idle power consumption are key to delivering good battery life overall.
Power During Boot
For the next test I measured power during a cold boot process. Here we're looking at power consumption from device off to hitting the Windows Start Screen:
Now we get our first glimpse of active power and there's a definite advantage here for Intel. Peak power consumption for the entire tablet tops out at just over 5W compared to 8W for Surface RT. Let's dig deeper to find what is responsible for the added power consumption:
The difference in average CPU power consumption is significant. Tegra 3 pulls around 1.29W on average compared to 0.48W for Atom. Atom also finishes the boot process quicker, which helps it get to sleep quicker and also contributes to improved power consumption.
GPU power is a big contributor as well with Tegra 3 averaging 0.80W and Atom pulling down 0.22W.
Launching Word 2013
As another simple test, I looked at power consumption while launching Microsoft Word 2013 on both platforms:
Here both tablets seemed to finish in about the same time but if you look at the power consumption graph you'll see that the W510 actually took a little bit longer. The difference wasn't great enough to really change the power profile: NVIDIA consumed 0.60W on average for its CPUs, while Intel pulled 0.48W on average:
Once again, there's a pretty stark difference on the GPU rail which makes me wonder if we're not looking at more than just GPU power here. Either that or Tegra 3's GPU implementation isn't all that power efficient compared to Imagination's. For the raw averages you're looking at 0.73W for NVIDIA compared to 0.23W for Intel.
SunSpider 0.9.1
Now the fun stuff. Doing power profiles of our standard benchmarks gives us good insight into how well each vendor was able to balance peak performance and average power. In general it's ok to burn more power for a short amount of time as long as it means you'll get to sleep quicker. This was one of the fundamentals of the first transition to mobile from the early 2000s.
We already know that Intel completes SunSpider quicker thanks to its improved memory subsystem over the Cortex A9, but it also does so with much better average power (3.70W vs. 4.77W for this chart). A big part of the average power savings comes courtesy of what happens at the very tail end of this graph where the W510 is able to race to sleep quicker, and thus saves a good deal of power.
JavaScript Performance | |||||||
Time in ms (Lower is Better) | Kraken | SunSpider | RIA Bench Focus | ||||
Acer W510 (Atom Z2760 1.8GHz) | 33220.9ms | 730.8ms | 3959ms | ||||
Microsoft Surface (Tegra 3 1.3GHz) | 49595.5ms | 981.1ms | 5880ms | ||||
Samsung ATIV Smart PC (Atom Z2760 1.8GHz) | 33406.0ms | 721.3ms | 3752ms | ||||
Apple iPad 4 (A6X) | 19086.9ms | 834.7ms | - | ||||
Google Nexus 10 (Exynos 5 Dual) | 11146.0ms | 1384.1ms | - |
I also used SunSpider as an opportunity to validate the results from Intel's tablets with my own review samples. To generate this chart I measured power, every second, at the wall with both devices plugged in and with a fully charged battery. The resulting power consumption numbers include the efficiency loss at the AC adapter but the general curve shoud mimic the results above:
Note that the results do generally line up, although measuring at the battery gives more accurate results for the device and using the NI DAQ I was able to get better granularity on the power measurements.
Looking at CPU level power consumption we see a very even match between Atom and Tegra 3. Intel's advantage really comes from being able to complete the workload quicker (0.52W compared to 0.72W on average).
Once again we see a pretty significant difference in power consumption on the GPU rail between these two platforms.
Kraken
Mozilla's Kraken benchmark is a new addition to our js performance suite, and it's a beast. The test runs for much longer than SunSpider, but largely tells a similar story:
RIABench
RIABench's Focus Tests are on the other end of the spectrum, and take a matter of seconds to complete. What we get in turn is a more granular look at power consumption:
WebXPRT
I also included Principled Technologies' new HTML5/js web test suite WebXPRT in our power analysis. Like the rest of the tests, Intel already outperforms NVIDIA here but does so with lower power consumption. A big part of the advantage continues to be lower power consumption on the GPU rail, surprisingly enough.
TouchXPRT
As our first native client test, we turned to PT's TouchXPRT 2013. As there is no "run-all" functionality in the TouchXPRT benchmark, we had to present individual power curves for each benchmark. The story told here is really more of the same. On the CPU side, Intel is able to deliver better performance at lower power consumption. On the GPU side, performance is good enough for these tasks but once again, is delivered at lower power consumption.
GPU Workload
NVIDIA's only performance advantage on the SoC side compared to Clover Trail at this point is in its GPU. Tegra 3's GPU is faster than the high clocked PowerVR SGX 545 in Clover Trail. While we don't yet have final GPU benchmarks under Windows RT/8 that we can share numbers from, the charts below show power consumption in the same DX title running through roughly the same play path.
NVIDIA's GPU power consumption is more than double the PowerVR SGX 545's here, while its performance advantage isn't anywhere near double. I have heard that Imagination has been building the most power efficient GPUs on the market for quite a while now, this might be the first argument in favor of that heresay.
Wireless Web Browsing Battery Life Test
For our final test I wanted to provilde a snippet of our 2013 web browsing battery life test to show what its power profile looked like. Remember the point of this test was to simulate periods of increased CPU and network activity, that could correspond to more than just browsing the web but interacting with your device in general.
Those bursts of power consumption are the direct result of our battery life test doing its job. That the tasks should take roughly the same time to complete on both devices, making this a good battery life test by not penalizing a faster SoC with more work.
Note that the W510's curve ends up lagging behind Surface RT's curve a bit by the end of the chart. This is purely because of the W510's garbage WiFi implementation. I understand that a fix from Acer is on the way, but it's neat to see something as simple as poorly implemented WiFi showing up in these power consumption graphs.
I always think about GPU power consumption while playing a game, but going through this experiment gave me a new found appreciation for non-gaming GPU power efficiency. Simply changing what's displayed on screen does burn an appreciable amount of power.
Final Words
Ultimately I don't know that this data really changes what we already knew about Clover Trail: it is a more power efficient platform than NVIDIA's Tegra 3. I summed up the power consumption advantage in the table below (I left out the GPU numbers since I'm not totally clear with what NVIDIA attaches to the GPU power rail on Tegra 3):
Power Consumption Comparison | ||||||||
Surface RT | W510 | Surface RT (CPU) | W510 (CPU) | |||||
Idle | 3.182W | 2.474W | 70.2mW | 36.4mW | ||||
Cold Boot | 5.358W | 3.280W | 800mW | 216mW | ||||
SunSpider 0.9.1 | 4.775W | 3.704W | 722mW | 520mW | ||||
Kraken | 4.738W | 3.582W | 829mW | 564mW | ||||
RIABench | 3.962W | 3.294W | 379mW | 261mW | ||||
WebXPRT | 4.617W | 3.225W | 663mW | 412mW | ||||
TouchXPRT (Photo Enhance) | 4.789W | 3.793W | 913mW | 378mW | ||||
GPU Workload | 5.395W | 3.656W | 1432mW | 488mW |
Across the board Intel manages a huge advantage over NVIDIA's Tegra 3. Again, this shouldn't be a surprise. Intel's 32nm SoC process offers a big advantage over TSMC's 40nm G used for NVIDIA's Cortex A9 cores (the rest of the SoC is built on LP, the whole chip uses TSMC's 40nm LPG), and there are also the architectural advantages that Atom offers over ARM's Cortex A9. As we've mentioned in both our Medfield and Clover Trail reviews: the x86 power myth has been busted. I think it's very telling that Intel didn't show up with an iPad for this comparison, although I will be trying to replicate this setup on my own with an iPad 4 to see if I can't make it happen without breaking too many devices. We've also just now received the first Qualcomm Krait based Windows RT tablets, which will make another interesting comparison point going forward.
Keeping in mind that this isn't Intel's best foot forward either, the coming years ahead should provide for some entertaining competition. In less than a year Intel will be shipping its first 22nm Atom in tablets, while NVIDIA will quickly toss Tegra 3 aside in favor of the Cortex A15 based 28nm Wayne (Tegra 4?) SoC in the first half of next year. Beating up on Surface RT today may be fun for Intel, but next year won't be quite as easy. The big unknown in all of this is of course what happens when Core gets below 10W. Intel already demonstrated Haswell at 8W - it wouldn't be too far fetched to assume that Intel is gunning for Swift/Cortex A15 with a Core based SoC next year.
Here's where it really gets tricky: Intel built the better SoC, but Microsoft built the better device - and that device happens to use Tegra 3. The days of Intel simply building a chip and putting it out in the world are long gone. As it first discovered with Apple, only through a close relationship with the OEM can Intel really deliver a compelling product. When left to their own devices, the OEMs don't always seem to build competitive devices. Even despite Intel's significant involvement in Acer's W510, the tablet showed up with an unusable trackpad, underperforming WiFi and stability issues. Clover Trail has the CPU performance I want from a tablet today, but I want Apple, Google or Microsoft to use it. I do have hope that the other players will wake up and get better, but for next year I feel like the tune won't be any different. Intel needs design wins among the big three to really make an impact in the tablet space.
The good news is Microsoft is already engaged with Surface Pro. It's safe to bet that there will be a Haswell version coming as well. Now Intel just needs an iPad and a Nexus win.