Why don't you guys make your own benchmark that is actually good? You guys did it in the SSD space and the benchmarking scene there was not nearly as big nor bad.
I don't think it's even possible to cheat in SSD benchmarks. I believe they made the SSD benchmark to simulate real-world performance rather than purely synthetic numbers. And I think I read Anand's tweet/comment somewhere that they do plan to make their secret, undisclosed (to phone manufacturers) benchmark in the future.
It absolutely is possible, which is why I've never given out our tests (or even low level details of our tests) to manufacturers even though they've asked over the years.
The same is true for our smartphone/tablet battery life tests.
Have you considered the open source route for a mobile benchmark? While having the source open would let manufacturers know exactly what tests and how they're done, it would enable peer review of the code and allow other site to do a custom build that'd evade basic application detection.
Hope you guys can afford one. Maybe kickstarter it and offer nothing in return. Just use the money to build the benchmark. In either case I just hope you don't let anyone touch it so they can't cheat it.
First, when discussing how devices feel in hand, I expect analyses of polar moment of inertia from now on.
I think one of the big problem is you need unique benchmark suites for phones, or browsers. The PC used to have some of these issues, but not so much anymore because most of the programs used to test computer CPUs are the same programs the user actually uses. When you tested Haswell it was with Excel, 7zip, Visual Studio, Photoshop and so on. Those are real programs that people use, so if a vendor "cheats" at those, well it just makes those programs actually faster for users. When benchmarking video cards at this point, its pretty much exclusively with video games (if there are synthetic benches in a video card review I just skip it). At this point its just understood that AMD and NVidia tweak their drivers to improve performance in big name games, but you know what - that means those games are actually performing faster, so its actually a good thing. (as long as they aren't degrading video quality like the old nvidia quake problems)
So really the problem appears to come down to a lack commonly used apps and games on phones? Because then "cheating" on those would just make life better for everyone, well except that the current Android cheating method kills battery life.
I think the problem is twofold. First, there is a distinct lack of automation tools. Windows automation is a mature, well-developed market, but this is much less true in the mobile/ARM space. Not to mention any such benchmark could well require root privileges in order to launch other apps and profile them. Is it even possible to write something like fraps as a non-root app for Android or iOS?
The other problem is basically what you mention--a lack of obvious benchmarking targets. Desktop benchmarks hit obvious, common tasks that people do with computers, like file compression, video encoding/decoding, rendering, and code compilation. The only tasks besides gaming that I can see being remotely applicable to mobile from a desktop benchmark are photo editing and maybe spreadsheet calculations. And mobile game developers don't include benchmarking tools, so I have to refer you back to my first point, which is that I at least don't know how you would go about benchmarking mobile games.
I'd love to see Anand's and Brian's approach to a good mobile benchmark, but it's definitely not a simple problem.
I'm glad you guys mentioned the argument that plugging in all cores and locking them to their max frequences is just mitigation for the joke of a DVFS implementation that they have (because why would you want to check frequency more often than every ten million cycles?). Have you guys had any conversations with hardware vendors about this? Have Qualcomm, ARM, or Apple even tried to implement any kind of turbo?
It seems that there's a huge competitive advantage here. Admittedly Intel and AMD have been working on their Turbo Boost/Core for a long time now, so I don't really imagine it's a simple problem, but why aren't they even trying?
It also makes me think of the crappy advertising of mobile core frequencies--when Intel sells you a 1.6GHz Core i5, you expect it to run at 1.6GHz most of the time, and hit up to 2.6GHz or whatever when there's thermal headroom available. But when you get a 1.9GHz APQ8064AB, you know it will mostly run at 1-1.2GHz because the phone just can't disspiate the heat generated by 1.2V at 1.9GHz. I would love to see the SoC designers and integrators held accountable for their TDPs and give realistic estimates of operating frequencies.
I'm sorry you copped so much crap for the note 3 review Brian, especially for how hard you must've worked to push out all three, people should've trusted you and Anand and had something in the works for benchmarkgate.
Thanks for your hard work and I really look forward to battery life and "lapablility" tests for Surface Pro 2 from you.
Yeah I'm also interested in Brian's thoughts on the Surface Pro, as he was an old school tablet user. I'll be waiting for broadwell to get one as my Nexus 7 is somewhat filling the gap between desktop and phone, but it doesn't serve my need for a stylus.
Brian is a windows hater, so not expecting much from him, heck he didn't even review the Lumia 920 and even 1020 review seems forgotten. And he has himself professed that he hardly used Surface 1st gen products, so its quite clear that he doesn't care much about those products.
I hope Anand reviews both Surface 2 and Surface Pro 2 , Anand can objectively review even the platforms which he doesn't use ignoring even his emotional feelings, a true Zen master :)
Just to say, I'm quite happy to have an ad or two if you'd like to put them in guys. Much prefer that than the most important tech site on the net start to struggle financially.
Is it possible Apple optimized for all of your benchmarks, too? How did they achieve almost a 2x increase in CPU performance WITHOUT increasing the clock speed from last year (well no more than 100 Mhz).
Seems pretty unlikely. Could there be a similar thing they're doing, where they activate a "turbo-boost" of the chip only for benchmarks?
The same way Intel was able to get a > 2x increase in performance from Clover Trail+ to Bay Trail increase IPC. Frequency scaling isn't the only option.
A few things:
1) I know for a fact that we run some things that Apple doesn't
2) A lot of what I used to characterize A7's performance isn't publicly available.
3) I do believe there is some optimization going on with the browser tests, at least the common ones (this also applies to Chrome/Android). WebXPRT I don't believe has been an optimization target, and this speaks to what I wrote about earlier where we need to keep all of our benchmarks a moving target.
its not all hardware improvements on that 2X, most of the gains were from moving to the more efficient ArmV8 instruction set. I'd almost say 90% of the gains came from that, because apple used most of the die area gained from 28nm on SRAM for the fingerprint sensor.
Going from Prescott P4 to Conroe Core 2 Duo was in many of my test cases a better than 2x increase in CPU performance at the same clock. It's not unprecedented :)
The A7 demonstrates markedly better performance in heavy real-world apps, like AudioBus chains with lots of effects or the partial Dreamcast emulation necessary to play .DSF music in Modizer. The A6 could play DSFs at 22.05 kHz output without breaking up, but 44.1 kHz or visualizers killed it. The A7 can play DSF at 44.1 kHz output *and* have multiple visualizers working.
In other words, I can confirm with real apps doing heavy stuff that the A7 is terrific. On Android, something like MAME4Droid Reloaded would be a great real-world CPU test (MAME can be a beast even on desktop PC CPUs), but Anand and Brian probably wouldn't want to admit to having ROM files.
just listening to this right now. can't believe the galaxy gear doesn't have a ambient light sensor... it would appear to me that they just revived a dead product and rushed it to market.
my sony smartwatch doesn't get enough props or love. maybe its pricepoint makes it appear to be a discardable device. that's one of its attractive points for me. no camera, no speaker, no mic, but it DOES have an ambient light sensor. good battery life and acceptable price. galaxy gear is just embarassing for smartwatch adopters...
Please stop spending so much pod cast time on the freakin boost clocks discussion. if your only gonna do a podcast once a month, please diversify your talking points. I'm still waiting for a good discussion on nvidia shield.
There is an established market that smart watches can replace. I currently wear a large, bulky gps watch for running and splits. If a smart watch can integrate the health functions of fitbit and the functionality of a smart watch that already would be a great value proposition. Many runners will pay north of $200 for a watch. If extra functionality is added such as integration with a smartphone companies like Apple and Samsung can support a much higher price than what current census is by cannibalizing an existing market.
Any perceived lack of love for the Surface 2 from Brian in this podcast is probably Microsoft's own fault. They asked him to attend a press event then didn't have enough devices?? That's terrible.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
28 Comments
Back to Article
dylan522p - Friday, October 4, 2013 - link
Why don't you guys make your own benchmark that is actually good? You guys did it in the SSD space and the benchmarking scene there was not nearly as big nor bad.dishayu - Friday, October 4, 2013 - link
I don't think it's even possible to cheat in SSD benchmarks. I believe they made the SSD benchmark to simulate real-world performance rather than purely synthetic numbers. And I think I read Anand's tweet/comment somewhere that they do plan to make their secret, undisclosed (to phone manufacturers) benchmark in the future.Anand Lal Shimpi - Friday, October 4, 2013 - link
It absolutely is possible, which is why I've never given out our tests (or even low level details of our tests) to manufacturers even though they've asked over the years.The same is true for our smartphone/tablet battery life tests.
Take care,
Anand
Anand Lal Shimpi - Friday, October 4, 2013 - link
It's extremely expensive :)What we've done in the SSD space is ideally what we'd want to do here, but it's not quite that simple.
This isn't a no, just means it's something that's going to require a lot more thought.
Take care,
Anand
Kevin G - Friday, October 4, 2013 - link
Have you considered the open source route for a mobile benchmark? While having the source open would let manufacturers know exactly what tests and how they're done, it would enable peer review of the code and allow other site to do a custom build that'd evade basic application detection.dylan522p - Friday, October 4, 2013 - link
They would detect certainly code and almost certainly cheat.Kevin G - Friday, October 4, 2013 - link
The ability to modify/build it yourself can be used to make this increasingly difficult.dylan522p - Friday, October 4, 2013 - link
Hope you guys can afford one. Maybe kickstarter it and offer nothing in return. Just use the money to build the benchmark. In either case I just hope you don't let anyone touch it so they can't cheat it.Callitrax - Friday, October 4, 2013 - link
First, when discussing how devices feel in hand, I expect analyses of polar moment of inertia from now on.I think one of the big problem is you need unique benchmark suites for phones, or browsers.
The PC used to have some of these issues, but not so much anymore because most of the programs used to test computer CPUs are the same programs the user actually uses. When you tested Haswell it was with Excel, 7zip, Visual Studio, Photoshop and so on. Those are real programs that people use, so if a vendor "cheats" at those, well it just makes those programs actually faster for users. When benchmarking video cards at this point, its pretty much exclusively with video games (if there are synthetic benches in a video card review I just skip it). At this point its just understood that AMD and NVidia tweak their drivers to improve performance in big name games, but you know what - that means those games are actually performing faster, so its actually a good thing. (as long as they aren't degrading video quality like the old nvidia quake problems)
So really the problem appears to come down to a lack commonly used apps and games on phones? Because then "cheating" on those would just make life better for everyone, well except that the current Android cheating method kills battery life.
teiglin - Friday, October 4, 2013 - link
I think the problem is twofold. First, there is a distinct lack of automation tools. Windows automation is a mature, well-developed market, but this is much less true in the mobile/ARM space. Not to mention any such benchmark could well require root privileges in order to launch other apps and profile them. Is it even possible to write something like fraps as a non-root app for Android or iOS?The other problem is basically what you mention--a lack of obvious benchmarking targets. Desktop benchmarks hit obvious, common tasks that people do with computers, like file compression, video encoding/decoding, rendering, and code compilation. The only tasks besides gaming that I can see being remotely applicable to mobile from a desktop benchmark are photo editing and maybe spreadsheet calculations. And mobile game developers don't include benchmarking tools, so I have to refer you back to my first point, which is that I at least don't know how you would go about benchmarking mobile games.
I'd love to see Anand's and Brian's approach to a good mobile benchmark, but it's definitely not a simple problem.
teiglin - Friday, October 4, 2013 - link
I'm glad you guys mentioned the argument that plugging in all cores and locking them to their max frequences is just mitigation for the joke of a DVFS implementation that they have (because why would you want to check frequency more often than every ten million cycles?). Have you guys had any conversations with hardware vendors about this? Have Qualcomm, ARM, or Apple even tried to implement any kind of turbo?It seems that there's a huge competitive advantage here. Admittedly Intel and AMD have been working on their Turbo Boost/Core for a long time now, so I don't really imagine it's a simple problem, but why aren't they even trying?
It also makes me think of the crappy advertising of mobile core frequencies--when Intel sells you a 1.6GHz Core i5, you expect it to run at 1.6GHz most of the time, and hit up to 2.6GHz or whatever when there's thermal headroom available. But when you get a 1.9GHz APQ8064AB, you know it will mostly run at 1-1.2GHz because the phone just can't disspiate the heat generated by 1.2V at 1.9GHz. I would love to see the SoC designers and integrators held accountable for their TDPs and give realistic estimates of operating frequencies.
dylan522p - Friday, October 4, 2013 - link
You know intel is even advertising the boost on Atom as actual clock because all the Arm guys are.Wade_Jensen - Friday, October 4, 2013 - link
I'm sorry you copped so much crap for the note 3 review Brian, especially for how hard you must've worked to push out all three, people should've trusted you and Anand and had something in the works for benchmarkgate.Thanks for your hard work and I really look forward to battery life and "lapablility" tests for Surface Pro 2 from you.
OzedStarfish - Friday, October 4, 2013 - link
Yeah I'm also interested in Brian's thoughts on the Surface Pro, as he was an old school tablet user. I'll be waiting for broadwell to get one as my Nexus 7 is somewhat filling the gap between desktop and phone, but it doesn't serve my need for a stylus.BMNify - Friday, October 4, 2013 - link
Brian is a windows hater, so not expecting much from him, heck he didn't even review the Lumia 920 and even 1020 review seems forgotten. And he has himself professed that he hardly used Surface 1st gen products, so its quite clear that he doesn't care much about those products.I hope Anand reviews both Surface 2 and Surface Pro 2 , Anand can objectively review even the platforms which he doesn't use ignoring even his emotional feelings, a true Zen master :)
Peanutsrevenge - Friday, October 4, 2013 - link
Still no ads either start or end?Just to say, I'm quite happy to have an ad or two if you'd like to put them in guys. Much prefer that than the most important tech site on the net start to struggle financially.
Anand Lal Shimpi - Friday, October 4, 2013 - link
I appreciate the vote of support :)BMNify - Friday, October 4, 2013 - link
one or two Ads will be fine Anand, Anyways we are listening for 1-2 Hours, a few seconds Ad won't do any harm :)Krysto - Friday, October 4, 2013 - link
Is it possible Apple optimized for all of your benchmarks, too? How did they achieve almost a 2x increase in CPU performance WITHOUT increasing the clock speed from last year (well no more than 100 Mhz).Seems pretty unlikely. Could there be a similar thing they're doing, where they activate a "turbo-boost" of the chip only for benchmarks?
Anand Lal Shimpi - Friday, October 4, 2013 - link
The same way Intel was able to get a > 2x increase in performance from Clover Trail+ to Bay Trail increase IPC. Frequency scaling isn't the only option.A few things:
1) I know for a fact that we run some things that Apple doesn't
2) A lot of what I used to characterize A7's performance isn't publicly available.
3) I do believe there is some optimization going on with the browser tests, at least the common ones (this also applies to Chrome/Android). WebXPRT I don't believe has been an optimization target, and this speaks to what I wrote about earlier where we need to keep all of our benchmarks a moving target.
jasonelmore - Saturday, October 5, 2013 - link
its not all hardware improvements on that 2X, most of the gains were from moving to the more efficient ArmV8 instruction set. I'd almost say 90% of the gains came from that, because apple used most of the die area gained from 28nm on SRAM for the fingerprint sensor.Arbee - Friday, October 4, 2013 - link
Going from Prescott P4 to Conroe Core 2 Duo was in many of my test cases a better than 2x increase in CPU performance at the same clock. It's not unprecedented :)The A7 demonstrates markedly better performance in heavy real-world apps, like AudioBus chains with lots of effects or the partial Dreamcast emulation necessary to play .DSF music in Modizer. The A6 could play DSFs at 22.05 kHz output without breaking up, but 44.1 kHz or visualizers killed it. The A7 can play DSF at 44.1 kHz output *and* have multiple visualizers working.
In other words, I can confirm with real apps doing heavy stuff that the A7 is terrific. On Android, something like MAME4Droid Reloaded would be a great real-world CPU test (MAME can be a beast even on desktop PC CPUs), but Anand and Brian probably wouldn't want to admit to having ROM files.
Yofa - Friday, October 4, 2013 - link
just listening to this right now. can't believe the galaxy gear doesn't have a ambient light sensor... it would appear to me that they just revived a dead product and rushed it to market.my sony smartwatch doesn't get enough props or love. maybe its pricepoint makes it appear to be a discardable device. that's one of its attractive points for me. no camera, no speaker, no mic, but it DOES have an ambient light sensor. good battery life and acceptable price. galaxy gear is just embarassing for smartwatch adopters...
jasonelmore - Friday, October 4, 2013 - link
Please stop spending so much pod cast time on the freakin boost clocks discussion. if your only gonna do a podcast once a month, please diversify your talking points. I'm still waiting for a good discussion on nvidia shield.Razorbak86 - Saturday, October 5, 2013 - link
Would you like a tissue?theCuriousTask - Friday, October 4, 2013 - link
There is an established market that smart watches can replace. I currently wear a large, bulky gps watch for running and splits. If a smart watch can integrate the health functions of fitbit and the functionality of a smart watch that already would be a great value proposition. Many runners will pay north of $200 for a watch. If extra functionality is added such as integration with a smartphone companies like Apple and Samsung can support a much higher price than what current census is by cannibalizing an existing market.LarsBars - Monday, October 7, 2013 - link
Any perceived lack of love for the Surface 2 from Brian in this podcast is probably Microsoft's own fault. They asked him to attend a press event then didn't have enough devices?? That's terrible.Sabresiberian - Saturday, October 12, 2013 - link
I love Anandtech podcasts! Thank you for bringing the quality to them that you maintain in your articles.