I just picked up a ASUS Memo pad HD7 for my wife, and it has a quad A7 @ 1.2Ghz. I have been pretty impressed with how well it runs. Great little device. Looks like the A53 will be quite a bit more awesome.
I do wish that table above included some high performance comparisons as well. It's nice to see the A9, but how about a Krait variant or two, and perhaps A15?
Between this press release, Qualcomm's CMO Anand Chandrasekher's comments, and no sign of 64-bit Krait, Qualcomm has quite the mixed marketing message regarding 64 bits.
Qualcomm really can't claim any benefit to 64 bits without making a glaringly obvious hole in their higher end lineup. Another announcement pending? Wouldn't think so since they just announced the 805.
805 will ship in 1H 2014, 410 in 2H 2014. It stands to reason that 805 is a stopgap before the 64 bit followup to krait is finished. If they're executing as well as they have been since krait's introduction, they would either have this 'krait2_64' ready for either the holiday season 2014 (unlikely, since that doesn't line up well with HW releases) or the new top devices of H1 2015 (S6, HTC one something). It could be that they make it for LG's next top device, perhaps the nexus 10 for 2H 2014.
Awesome! If this is the Snapdragon 410, and on the high end we only have an 805, I'll be looking forward to the Snapdragon 810 soon. Qualcomm's naming structure has improved lately but it's still a little bit complicated sometimes.
This would have been the perfect opportunity to use the odd-numbered hundreds.
As in, the Snapdragon 500s would be the 64-bit versions of the 400s. The S700s would be the 64-bit version of the S600s. And the S900s would be the 64-bit versions of the S800s.
However, considering how overloaded S4 Pro has become lately, I don't expect to see anything logical come out of Qualcomm's naming.
You are here, because the 64 bit processors also have improved performance. Forget the 64 bitness and concentrate instead on the generation; the old generation of processors is slower than the next generation of processors, as per the norm, it just happens to be that the next generation of processors also happen to be 64 bit. You get more registers, better IPC, wider execution units, and lower power all mixed together. 64 bit is just frosting at that point.
"more registers" Does this mean you will buy a car with 8 tyres v/s 4 tyres? I am sick of this MORE IS BETEER!!!! mentality. Most modern games don't even use 64 bit registers.
"Lower power" Just because your car is wider than your motorbike doesn't mean it's more fuel efficient. Its the same with processors.
More registers are better. Period. At least until ridiculous numbers.
Registers can be thought of as a very fast cache. The more registers a CPU has, the less the compiler has to move data back and forth to/from memory.
You seem to have confused quantity of registers with width of registers, and then conflated the power savings proposition with the 64 bitness. In fact, the power saving comes (in part) because the 32 bit ARM ISA evolved over time and has numerous tweaks for backwards compatibility, requiring more transistors and more power. The 64 bit ISA wipes that slate clean and implements only what is required. It's more efficient because it ditches those tweaks *and* is designed with learnings from the past decade in mind.
64 bit isn't better in some abstract sense. 64 bit ARM is both higher performing and lower power than 32 bit ARM. And, as luck would have it, that's what we're talking about here.
64 bit can perform better in some cases (when actually using more than 32 bits) but it uses more power because there is twice the logic involved when doing a basic addition such as 1+1.
No, the amount of logic involved doing the actual addition is a small proportion of the total involved in the execution of a single instruction. So a 64-bit addition might use maybe 5% more power than a 32-bit addition, not twice as much.
It's moot anyway (of course you're well aware of this, just explaining for everyone else), AArch64 has 32-bit arithmetic operations and most code is limited to 32-bit integers outside of pointer arithmetic.
Dan, most CPU logic is not in math, but in a lot other components like: CPU cache (which is sometimes more than 1/2 of the entire CPU), branch prediction, memory addressing unit, etc. Also, when you use 64 bit CPUs, the code is using still 32 bit integers, making the transistor count the same. Without knowing the full specifics, most of 64 bit integers can be implemented by using 32 bit integer math, so the extra added logic can be reduced even further (as an uncommon path).
Are more registers faster? Oh, yeah. By a large amount because the registers run like 4 times faster than L1 cache (or even more), like 10-20 times faster than L2 cache, and the L2 cache is typically 10x faster than the memory access. A compiler that can have 2x more registers on the target CPU will likely give a code that is not 4 times faster, but 30-50% speedup is doable in a lot of real code. LLVM (the main backend optimizer) stated that when improving by 10% the register allocation got a speedup up-to 20% http://bit.ly/1d7B3aw
That would sound nice, but you miss the point that ARM8 has a 32 bit mode that is compatible with ARM7 (and transitively with older ARM ISAs). So they cannot "wipe that slate clean" at all, everything has to be there.
More registers are generally better indeed, however the gain from 14 to 31 is not that large - studies indicated around 20-24 is optimal. Note there are drawbacks as well to having more registers such as a slower process switch.
The A53 includes all 32-bit instructions, so can run all existing binaries. So nothing has been ditched at all. The power savings are not due being 64-bit and not due to the new ISA either. The efficiency improvements are simply due to it being newer and better than its predecessors (if it had been 32-bit then any gains would be the same).
64-bit code will often run a little faster than 32-bit but not hugely so. While the 64-bit ISA allows for power savings in decoding, 64-bit pointers and registers increase power slightly, so which effect is larger will depend on each particular application.
Do you links for any studies outside of the ones AMD did when evaluating x86-64? Those are good for a single data point, but they're a bit limited given that they were specific to x86-64 and a relatively wide OoO uarch. In-order uarchs, in comparison, benefit from code that's more aggressively scheduled to hide latency which increases register pressure.
I was thinking of the original RISC studies for MIPS and SPARC. They are old now but I confirmed those results for ARM - basically the benefit of each extra register goes down exponentially. If you have a good pressure-aware scheduler (few compilers get it right...) then you only need a few extra registers.
Thanks for the clarification. I'd say that even a sweet spot is 20-24 justifies 31 GPRs (plus SP). I'd also argue that research done with the original MIPS and SPARC aren't perfectly representative of something like Cortex-A53 either. In my experience, going from hand coding ARM9 to Cortex-A8 assembly presented a lot of new challenges in scheduling which absolutely increased register pressure. Dual issue means you have to hide more instructions in a similar latency, and generally more latencies were added, like for address generation or shifts. The original 5 stage RISC CPUs like the first MIPS uarchs would be a lot closer to ARM9 than Cortex-A8. Cortex-A53 probably doesn't have as many interlock conditions as A8 but it should still be substantially worse than ARM9.
One particular application I know I'd appreciate having 31 GPRs for is emulating another ISA with 16 GPRs, like x86-64..
Do you know how computers work? Registers actively store the work in progress of the CPU. Adding two numbers takes three registers. Adding 12 sets of numbers in parallel takes 36 registers.
Increasing your register count 10 fold allows you 10x improvement in performance, assuming 10 available execution units to do work. The way register files work, you can also work on more bits at a time too! Instead of adding 2 ints you can add 20 ints, 10 doubles, or 5 floats at a time.
I'm pretty sure it's you who does not know how do registers work..
You don't get 10x performance from 10x registers. You can maybe, if you are very-very lucky, get 1/10th usage of the main memory. That is faster, but even if your program has 50% memory operations (unrealistically high) and 50% other, then you get from 50%+50% -> 5%+50% execution time from the 10x registers, that is 1.8x speedup, all things being super optimal. In exchange, the registers themselves use more power.
You get 10x performance from 10x execution units. That will give you *more than* 10x power (due to how pipelines work) too. Even Haswell has only 7 execution units..
I explained my post perfectly. 10x registers and 10 available execution units = 10x improvement in performance. I apologize if my hyperbole threw you for a loop, I was trying to explain a concept.
10 execution units with only 3 registers = 1 add per clock. 10 execution units with 30 registers = 30 adds per clock. I also hinted at parallel processing; if you have 5 floats that need to be added to 5 floats (or multiplied, or accumulated, or whatever), you can do that in one clock cycle now.
AES saw a huge improvement because AArch64 has instructions specifically to assist AES acceleration which Geekbench is leveraging. If DGEMM uses double precision then it'd have seen a big improvement due to AArch64 adding support for double precision SIMD. The smaller improvements (and one notable regression) in the integer tests could be from the increased register count but possibly also from other factors, like for example if Cyclone is more efficient with conditional select in AArch64 than predication in AArch32.
As for register counts, AArch64 actually has 31 64-bit general purpose registers + 1 64-bit stack pointer and 32 128-bit SIMD registers.
More registers = the compiler can generate better, more efficient code. This is why some software runs up to 20% faster on x64 vs. x86 with just a recompile.
As for lower power, listen to some AT podcasts about the "race to sleep" concept. All other things being equal, a phone that finishes a task faster can use less battery.
Actually "race to sleep" uses more power because you are running the CPU at a higher frequency and voltage. It's always better to spread tasks across multiple CPUs and run at a lower frequency and voltage, even if that means it takes longer to complete.
Um, no. Not in the slightest. Race to sleep is the best fit we've yet come up with given our current technologies (i.e., constant running power and fixed performance-to-sleep transition requirements). See: www.cs.berkeley.edu/~krioukov/realityCheck.pdf www.usenix.org/event/atc11/tech/final_files/LeSueur.pdf for some examples.
LOL. Not all CPUs are 130W extreme edition i7's on an extremely leaky process!!!
A < 5W mobile core on a modern low power process has very little leakage (unlike the i7), so it is always better to scale the clock and voltage down as much as possible to reduce power consumption. Big.LITTLE takes that one step further by moving onto a slower, even more efficient core. Running as fast as possible on the fastest core is only sure way to run your batteries down fast.
Anand has been talking about 'race to sleep' as it applies to mobile CPUs since 2010 now. So, no, it isn't always better to scale the clock down, or it hasn't been in practice.
That link doesn't say anything about "race to idle". Basically if you look at the first graph it shows that the 3 devices have different idle consumption simply due to using different hardware (the newest hardware wins as you'd expect). Anand concludes the device with the lowest idle consumption uses less energy over a long enough timeframe eventhough it may use far more power when being active. True of course, but that has nothing to do with "race to idle".
Look at the right side of the graph "Heterogeneous CPU operation". That shows performance vs the amount of power consumed. As you can see it is not linear at all, and the more performance you require, the more the graph curves to the right (which means less efficient). To paraphrase Anand: "Based on this graph, it looks like it takes more than 3x the power to get 2x the performance of the A7 cluster using the Cortex A15s." So if you did "run to idle" on the A15, you'd use at least 50% more energy to execute the same task on the A7. Of course the A7 runs slower and so returns to idle later than the A15, but it still uses less energy overall.
This graph ignores the benefit to sleeping on non-core resources... fabrics, IOs, etc. If your internal datapaths are scaled to meet higher performance cores, low powered cores running longer may lead to inefficiencies.
It's obviously about balance in SoC design, and efficiency is not as simply as running small cores at low frequencies if you still want to allow scaling to higher performance..
Mobile devices are never really fully sleeping, so while you could power off the screen, you still need to check the touchscreen, keep in contact with the base station, check for incoming calls etc.
Yes you definitely want to scale to high performance rather than only using slow cores or low frequencies. That increases power consumption and total energy to perform a given task.
Yes it does: If we extended the timeline for the iPhone 4 significantly beyond the end of its benchmark run we'd see the 4S eventually come out ahead in battery life as it was able to race to sleep quicker.
No. The 4S uses significantly more power than the iPhone 4 when actually running. You can clearly see that the power consumption above the idle level for the 4S is exactly twice that of the 4, but it is only 75% faster. That means it used about 15% more energy to complete the benchmark. So clearly "race to idle" uses more energy than running a bit slower. If you lowered the maximum clock frequency of the 4S then it would become as efficient as the 4.
If race to sleep were always the best solution CPUs wouldn't have DVFS at all. You'd run it at the highest clock always then power gate when idle. But that's not how things are done at all. Dynamic clocking schemes actively try to minimize the amount of time the CPU spends idle, at least until it hits some minimum clock speed.
More registers are better because its the fastest memory available to the CPU. The less you have, the more it'll be spent waiting on slower memory. Also, RISC architecture has no concept of say "add RAM_location_1 to RAM_location_2", RISC can only move data from RAM to registers and perform operations on registers. Therefore, more registers is pretty much vital on a RISC system.
There are also specific security and ease-of-programming benefits to 64 bit. The Linux kernel used in Android supports address space randomization; it's much easier to pick an unexpected address range when you have 64 bits to play with. And the POSIX mmap() functionality can work on much later files with a 64-bit address space, making it easier to write performant apps that work with large data sets (high MP photos, for instance).
Having 64 bit applications is even a disadvantage when you have little (say 1GB or less) of memory. As mentioned http://www.anandtech.com/show/7460/apple-ipad-air-...">here on Anandtech 64bit apps use 20-30% more memory so 1GB of memory on a 64bit phone/tablet equals to ~0.75 GB memory on a 32bit device. Meaning that you see reloading tabs in browsers and apps occur sooner.
Yes. Go read the review of the iPhone 5S and the Apple A7. There's a pretty significant performance increase from going 64-bit. Even when running 32-bit code.
Estimates are that going to 64 bit can bring a 15% increase in performance by itself. There is a multiplication factor to that when you include the other improvements made.
But for a number of mathematical calculations, such as those made using encryption, estimates are that there could easily be a 10x improvement. The touch sensor would be benefiting. Perhaps that's why it's so fast.
Yes and no. Yes in terms of testing the demand for low-end 4-bit chips to allow for easier and faster adoption later. Then hit the market with high-end fast chips when it mushroomed. Secondly to stake in the ground that Qualcomm has a production 64bit Arm chip that it can do amazing 64bit chips later when the demand shoots up in the market. Qualcomm has been able to move fast, so that is a real advantage to competitors like Nvdia who announces and OEMs wait and wait. Then waited somemore ...
I think that in first approximation you're right: using 64bit CPU to run 32bit code is substantially useless. It will go a bit faster in some cases, consume a bit more, all things being equal. I think though that the point of 64bit CPU is a mid/long term preparation. I think it is inevitable to see mobile systems with 4GB or more of memory: already the tablets, especially 10" tablets, could (should?) be used to multitask and to run some relatively intense computational programs. I think what Qualcomm is trying to avoid is what happened int the PC market, where the need for 4GB+ of RAM was there and at that point the vast majority of PCs used 32bit CPUs. For cellphones, especially the ones in the ranges targeted by the 410, I really don't see any significant benefit, other than making the architecture future-proof and having 64bit-only devices across the board.
In terms of the iPhone 5S, the 64-bit ARM cores actually are a step up in performance, due to operating system optimisations for the hardware, as well as instruction set changes that improve performance, and larger/more registers (the obvious 64-bit benefit). I believe there was an article on Ars explaining the operating system enhancements that allow iOS7 to take advantage of the 64-bit architecture more than a straight 64-bit port of an OS normally would.
A couple of months ago there was an interesting discussion on realworldtech.com about the benefits of 64-bit. Linus Torvalds (if I remember correctly) mentioned that 64-bit was already beneficial when physical memory exceeds 896 MB. Reason was that above 896 MB, it is not possible to address all physical memory from 32-bit kernel virtual memory space (which is limited to 1 GB). At that point managing memory becomes significantly less efficient because of the need to frequently remap virtual memory space to different chunks of physical memory. Unfortunately was not able to locate the thread anymore.
Anand, if you would truly "personally much rather see two higher clocked A53s", and if that applies to quad core versus dual core in general, you should start using your influence through this site and with direct industry connections to put that out there. I don't recall reading anything significant talking down quad cores versus dual cores in your articles when it may have been applicable (not Apple because they don't have a quad core anyway), and if it was even mentioned it was so minor that it was easy to overlook.
I've been doing this for the past year (in fact I literally just did this last week). I am going to start campaigning for this more aggressively though. It'll take a while before we see any impact given how long it takes to see these things come to fruition though.
Why not campaign for better dynamic clock control/turbo/etc. To me, that seems like the best solution going forward. For example, back in the day of the Core 2 Duo vs Core 2 Quad, you had to make the compromise, max clock (Dual) or Max Multi Thread Perf (Quad). Nowadays with turbo mode and whatnot you can essentially have the best of both worlds. A quadcore chip that shuts down 2 cores and can run as a fast dualcore if needed.
For a lot of reasons I think it will be very difficult convincing people, oems, etc that moving from 4 cores to 2 cores is a good thing, and in some ways it really IS moving backwards. (You are still making that same compromise)
Also another part of what I (and a few others) have been campaigning for over the past year as well. Power management and opportunistic turbo is still largely a mess in mobile. Thankfully there are improvements coming along this vector.
Extending the A53 pipeline length to go higher frequencies would seem to go against the big.LITTLE scheme. ARM probably gets non-trivial benefits by keeping the pipeline on the short side.
I'd like to know if A57 is going to allow sustained usage in <=5" phones without having to kick over to the A53. Perhaps only for 1-2 thread usages?
Agreed, a longer pipeline will reduce power efficiency considerably, making it less suitable for big.LITTLE. However further frequency gains are likely, the ARM website says 2GHz is expected (no mention of process, so I guess on 16nm in 2015).
We already have 1.8GHz quad-core A15 phones today. ARM claims that A57 is actually more power efficient, so I don't see why there would be an issue with using A57 in a phone besides the fact that 64-bit seems a bit unnecessary. The 20nm process will be used next year as well, improving power and performance further.
Dual vs quad is a lost cause already, especially since we're moving towards 8 cores (4+4 in big.LITTLE). The die-size cost is low enough that the performance gain in the cases where you can use all CPUs is worth it.
Very true, I don't know the die size of a 28nm A53 core (and the L2 cache it will need), but on a modern ~60mm^2 "economy SoC" there's probably not much difference between two and four cores (<5 mm^2).
The difference between 2 and 4 Cortex-A7 cores is less than 1mm^2 on 28nm, so why go for dual core when quad core is almost free? A53 will be a bit larger than A7 of course, but the same will happen.
I don't understand why Anand is against quad cores. If you only use 1 or 2 cores, then having a quad core doesn't cost you anything (not in power nor frequency). If you can use more than 2 cores you get the benefit of having a quad core. So you never lose out.
The problem is China. If you want to eventually sell your phone there it needs to be quad core for some reason. The marketing push for quad core chips was too effective and the market was too big. If Qualcomm wants to eventually compete in asia at least.
I think I read about it at techpinions or maybe I was listening to Ben Thomson.
Certainly it appears that current Android apps rarely use more than two (meaty) threads, so it is common to see two cores out of four sleeping most of the time in quad-core SoCs.
It would be nice to see some form of turbo in these quad-cores. E.g., 4xA53 @ 1.2GHz, [email protected], [email protected]. It would certainly help in those single threaded Javascript benchmarks that everyone uses to test single threaded performance as if its even meaningful unless the systems are running the same OS, browser.
Just because Anand is calling it out doesn't mean that Qualcomm disagrees. The market for this processor (the Asian market) demands more cores because the OEMs want to market as many cores as possible. It doesn't matter if they are slow or not (I think I saw someone wants to make an 8 core Cortex-A7 chip), that's what the OEMs want.
Being able to market the first 64-bit Android devices will be a huge selling point even though the devices won't have more than 1 GB of memory and the users will never use encryption because that's how the market works over there. Nobody realizes that Intel makes dual core parts that are several time faster than these.
Yeah, Asians are weird. I recently read that many Koreans belief you die if you run a fan in a closed room. This includes people like physics students. WTF? Back on topic, Quad-cores make no sense in phones. I admit, mine has a quad too. For once I have to applaud to Apple (I actually "hate" Apple). Their A6 and A7 are very well thought-out designs. However that's not enough to charge 2-3 times the price to comparable phones.
agree'd, however apple tax is much greater on their tablets. Ex: a decent usable ipad mini w/retina will set you back $630 + Tax (32GB LTE Model) and this is a small 8" tablet.
we need these tablets to come down quite a bit, even $500 for the above mentioned specs is expensive.
Why would you go backwards unless there is a clear sign of disadvantages? Would you say the same about memory, storage, or screen resolution? It makes no sense when these days web browsers can make use of as many cores as available. (and they have been doing so for some time)
Try scrolling the HOME PAGES on iPad mini (2013) with a few Safari tabs open. Tell me what it looks and feels like. I thought I was dealing with Touchwiz.
Web page rendering isn't multi-threaded at all not even on desktop browsers. I would say most stuff on smartphones isn't. But that does not matter. Even on a desktop 2 fast cores will always be better than 4 slow ones.Howver with Intel turbo in quads that is not an issue anymore but it still is in phones (speak ARM).
How far is 64 bit Android? I mean, KitKat was just released, so are we looking at a year at minimum? or will Google announce some uber dark project x64 they've been working on.
Makes no sense to want higher clocks for A53 , we would be better off with 2+2 A57 and A53 or even 1+2.. Would be nice to see some very thin devices , bellow 5mm , with 4xA53 clocked low.- ofc this one is on 28nm so we can hope for more soon when 20nm parts show up.
I'm with Qualcomm adopting this, as I thought they would, since they were already using A7 in S400, but this is rather embarrassing for Qualcomm.
I mean how the hell is Qualcomm's first ARMv8 chip an ARM one?! I thought the whole point of licensing the architecture and building the core yourself was that you got to release it EARLIER than ARM themselves. I expected Nvidia to not release an ARMv8 chip in 2014, but Qualcomm seems to have handled their transition to ARMv8 just as poorly.
In the end, this is good, though. It just means Qualcomm gets to have less monopoly, and perhaps for once Samsung will do something with their chips, and try to steal customers away from Qualcomm, by giving their chips to others. I don't know what the hell the chip business of Samsung is thinking. They had so many opportunities to push Exynos chips as competitors to Qualcomm's chips, especially since Apple wants to give up on having them make their chips, and they never took advantage of them. They even have their own fab. Heck, they aren't even using them for their own chips, let alone sell them to others. If Samsung will be the first with a 64-bit chip in 2014, this would be a good opportunity for them to start doing that.
I think that Qualcomm had a good reason to develop its own core with the Krait(Cortex A15 having problematic power efficiency), but now with the A53/A57, I wouldn't be surprised if we don't see the successor of Krait core actually.
And using stock ARM won't make them any less competitive.. Qc is the market leader in the whole SoC design, and that has always been their greatest strength. The CPU architecture is just a small part of their puzzle.
Because no one expected the Spanish Inquisition, or Apple sending the market into a 64 bit frenzy in 2013.
Qualcomm no doubt has a 64 bit krait roadmap but Apple destroyed that and now they are playing catch up the best they can, and that apparently means they have to rely on an ARM design for now.
You're expecting too much. Had Apple not released their 64 bit A7 then neither NVIDIA nor Qualcomm (or Samsung) would have been 'late'.
You also over estimate Samsung. Samsung's initial chip designs were via Intrinsity, which Apple bought out from under them. Without that they have to rely on ARM and their own in-house talent, which has demonstrably been experiencing growing pains as they try to DIY. Likewise Apple has never 'given up on them', though they may be shopping around. Apple still purchases the bulk of their SoC from Samsung, which deprives Samsung of the capacity necessary to fulfill their own orders, much less anyone else's. In that light this is why Samsung has and continues to use third party chips like Snapdragon and Tegra... and third party fabs like TSMC!
So relax; it doesn't matter who is second with a 64 bit chip, it only matters that you have sufficient competition to get a phone/tablet you like.
It does matter who is #2, because it influences the Android market, and thus competition. Also, both Qualcomm and Samsung are late. That's just a fact.
Krysto is right that Qualcomm missed the 64-bit train, which is why they demoted their CMO when he started to trash Apple's A7. If they are forced to use ARM designs for their first 64-bit chip it is essentially an admission of failure to predict where the market would be. (If you actually need 64-bit SoC's in phones for 2014 is up for debate, I think it's highly debatable.
But the fact is that the market has moved here in preparation for future phones and tablets and Qualcomm is late. Simple as. But it's not going to be very serious. If they can get their Krait 64-bit before the year is over or very early next year, not much is lost. But still; embarrassing to be so flatfooted and kudos to Apple for driving the ecosystem(once again).
No, Apple was unexpectedly early. Everyone had a published or announced roadmap indicating 64 bit would happen by the end of 2014; none of that has changed.
Maybe it's my google skills but I can't find anything resembling a Qualcomm or Samsung roadmap talking about 64 bit dates prior to the A7 announcement and as you know qualcomm's cmo immediately wrote 64 bit off as a "gimmick" which does imply a thing or two about Qualcomm's priorities at the time of the announcement.
ARM announced the 64 bit CPUs in 10/12, and would deliver their base designs by about 6/13, with the expectation of production silicon by spring of 2014. AMD had announced it's product by 2014.
The 410 is simply a low end reference design if I'm not mistaken, which they can get to market much faster than a 64 bit Krait and there's no Qualcomm published roadmap with a 64 bit Krait in it as far as I'm aware.
That's the irony isn't it? Apple beat ARM to market with a custom design, while Qualcomm is relying on a reference design to be first to market. Yes, Qualcomm never published a roadmap, but ARM did. I just assumed that everyone would release their parts after ARM did.
I am sure Qualcomm has a Krait 64bit design but it might be late, hence the use of A53 reference design as the way to "beat the 2013" time frame in order to show the market it has some 64-bit cred. This buys time for a real Krait 64 sometime Q2 or next year when it will begin to heat up as the other Arm players begin to show their iterations of the reference A57 designs. Here is where Krait can still work its magic to capture the bulk of the market the way it did with S600.
There is no pressure in the Android world to switch to 64 bit earlier (Android would still run using the ARM v7s ISA). Apple, on the other hand, does have advantages switching to 64 bit on its own terms in addition to speedups. For instance, I expect Apple's 64 bit transition for their ARM-powered devices to be very smooth sailing: by the time their devices hit the 4 GB barrier, all of iOS ecosystem has moved to 64 bit long ago.
In the Android world having a »64 bit cpu« is just a marketing checkbox at this point. However, he advantages of A53 (low power consumption, higher performance) remain even if it is run in 32 bit mode.
Well thank goodness that everyone that "thinks" they know what's going on said that 64-bit on a phone is a waste right??? Last thing we want is for the fandroid community to once-again be proven wrong right?
"I'm really excited to see what ARM's Cortex A53 can do." Then we have a marketing table from Qualcomm with projected numbers presented as fact. I'm holding my breathe all next year in anticipation.
If Cortex A53 is 43 percent faster than Cortex A7 at same clock speed, then Qualcomm's Snapdragon 410 at 1.4 Ghz should be 65 percent faster than Snapdragon 400. So Moto G2 could be that much faster than the original. Not bad improvement in 12 months time.
The GPU, however, seems a little disappointing? I'm not sure how much faster it is compared to the one in Snapdragon 400, but overall it should be less than Adreno 225/Adreno 305? I hope I'm wrong. I'd expect to see Adreno 320 level performance in such devices in late 2014/early 2015.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
95 Comments
Back to Article
extide - Monday, December 9, 2013 - link
I just picked up a ASUS Memo pad HD7 for my wife, and it has a quad A7 @ 1.2Ghz. I have been pretty impressed with how well it runs. Great little device. Looks like the A53 will be quite a bit more awesome.I do wish that table above included some high performance comparisons as well. It's nice to see the A9, but how about a Krait variant or two, and perhaps A15?
FwFred - Monday, December 9, 2013 - link
Between this press release, Qualcomm's CMO Anand Chandrasekher's comments, and no sign of 64-bit Krait, Qualcomm has quite the mixed marketing message regarding 64 bits.Qualcomm really can't claim any benefit to 64 bits without making a glaringly obvious hole in their higher end lineup. Another announcement pending? Wouldn't think so since they just announced the 805.
Mil0 - Tuesday, December 10, 2013 - link
805 will ship in 1H 2014, 410 in 2H 2014. It stands to reason that 805 is a stopgap before the 64 bit followup to krait is finished. If they're executing as well as they have been since krait's introduction, they would either have this 'krait2_64' ready for either the holiday season 2014 (unlikely, since that doesn't line up well with HW releases) or the new top devices of H1 2015 (S6, HTC one something). It could be that they make it for LG's next top device, perhaps the nexus 10 for 2H 2014.Drumsticks - Monday, December 9, 2013 - link
Awesome! If this is the Snapdragon 410, and on the high end we only have an 805, I'll be looking forward to the Snapdragon 810 soon. Qualcomm's naming structure has improved lately but it's still a little bit complicated sometimes.phoenix_rizzen - Monday, December 9, 2013 - link
This would have been the perfect opportunity to use the odd-numbered hundreds.As in, the Snapdragon 500s would be the 64-bit versions of the 400s. The S700s would be the 64-bit version of the S600s. And the S900s would be the 64-bit versions of the S800s.
However, considering how overloaded S4 Pro has become lately, I don't expect to see anything logical come out of Qualcomm's naming.
mtalinm - Monday, December 9, 2013 - link
I keep telling my macolyte friends that 64-bit processors don't mean much unless you have >3GB of memory. Am I wrong on that?michael2k - Monday, December 9, 2013 - link
You are here, because the 64 bit processors also have improved performance. Forget the 64 bitness and concentrate instead on the generation; the old generation of processors is slower than the next generation of processors, as per the norm, it just happens to be that the next generation of processors also happen to be 64 bit. You get more registers, better IPC, wider execution units, and lower power all mixed together. 64 bit is just frosting at that point.blanarahul - Monday, December 9, 2013 - link
"more registers"Does this mean you will buy a car with 8 tyres v/s 4 tyres? I am sick of this MORE IS BETEER!!!! mentality. Most modern games don't even use 64 bit registers.
"Lower power"
Just because your car is wider than your motorbike doesn't mean it's more fuel efficient. Its the same with processors.
Rest all points are vaild.
BrooksT - Monday, December 9, 2013 - link
More registers are better. Period. At least until ridiculous numbers.Registers can be thought of as a very fast cache. The more registers a CPU has, the less the compiler has to move data back and forth to/from memory.
You seem to have confused quantity of registers with width of registers, and then conflated the power savings proposition with the 64 bitness. In fact, the power saving comes (in part) because the 32 bit ARM ISA evolved over time and has numerous tweaks for backwards compatibility, requiring more transistors and more power. The 64 bit ISA wipes that slate clean and implements only what is required. It's more efficient because it ditches those tweaks *and* is designed with learnings from the past decade in mind.
64 bit isn't better in some abstract sense. 64 bit ARM is both higher performing and lower power than 32 bit ARM. And, as luck would have it, that's what we're talking about here.
danbob999 - Monday, December 9, 2013 - link
64 bit can perform better in some cases (when actually using more than 32 bits) but it uses more power because there is twice the logic involved when doing a basic addition such as 1+1.Wilco1 - Monday, December 9, 2013 - link
No, the amount of logic involved doing the actual addition is a small proportion of the total involved in the execution of a single instruction. So a 64-bit addition might use maybe 5% more power than a 32-bit addition, not twice as much.Exophase - Tuesday, December 10, 2013 - link
It's moot anyway (of course you're well aware of this, just explaining for everyone else), AArch64 has 32-bit arithmetic operations and most code is limited to 32-bit integers outside of pointer arithmetic.ciplogic - Wednesday, December 11, 2013 - link
Dan, most CPU logic is not in math, but in a lot other components like: CPU cache (which is sometimes more than 1/2 of the entire CPU), branch prediction, memory addressing unit, etc. Also, when you use 64 bit CPUs, the code is using still 32 bit integers, making the transistor count the same. Without knowing the full specifics, most of 64 bit integers can be implemented by using 32 bit integer math, so the extra added logic can be reduced even further (as an uncommon path).Are more registers faster? Oh, yeah. By a large amount because the registers run like 4 times faster than L1 cache (or even more), like 10-20 times faster than L2 cache, and the L2 cache is typically 10x faster than the memory access. A compiler that can have 2x more registers on the target CPU will likely give a code that is not 4 times faster, but 30-50% speedup is doable in a lot of real code. LLVM (the main backend optimizer) stated that when improving by 10% the register allocation got a speedup up-to 20% http://bit.ly/1d7B3aw
xdrol - Monday, December 9, 2013 - link
That would sound nice, but you miss the point that ARM8 has a 32 bit mode that is compatible with ARM7 (and transitively with older ARM ISAs). So they cannot "wipe that slate clean" at all, everything has to be there.Wilco1 - Monday, December 9, 2013 - link
More registers are generally better indeed, however the gain from 14 to 31 is not that large - studies indicated around 20-24 is optimal. Note there are drawbacks as well to having more registers such as a slower process switch.The A53 includes all 32-bit instructions, so can run all existing binaries. So nothing has been ditched at all. The power savings are not due being 64-bit and not due to the new ISA either. The efficiency improvements are simply due to it being newer and better than its predecessors (if it had been 32-bit then any gains would be the same).
64-bit code will often run a little faster than 32-bit but not hugely so. While the 64-bit ISA allows for power savings in decoding, 64-bit pointers and registers increase power slightly, so which effect is larger will depend on each particular application.
Exophase - Tuesday, December 10, 2013 - link
Do you links for any studies outside of the ones AMD did when evaluating x86-64? Those are good for a single data point, but they're a bit limited given that they were specific to x86-64 and a relatively wide OoO uarch. In-order uarchs, in comparison, benefit from code that's more aggressively scheduled to hide latency which increases register pressure.Wilco1 - Tuesday, December 10, 2013 - link
I was thinking of the original RISC studies for MIPS and SPARC. They are old now but I confirmed those results for ARM - basically the benefit of each extra register goes down exponentially. If you have a good pressure-aware scheduler (few compilers get it right...) then you only need a few extra registers.Exophase - Tuesday, December 10, 2013 - link
Thanks for the clarification. I'd say that even a sweet spot is 20-24 justifies 31 GPRs (plus SP). I'd also argue that research done with the original MIPS and SPARC aren't perfectly representative of something like Cortex-A53 either. In my experience, going from hand coding ARM9 to Cortex-A8 assembly presented a lot of new challenges in scheduling which absolutely increased register pressure. Dual issue means you have to hide more instructions in a similar latency, and generally more latencies were added, like for address generation or shifts. The original 5 stage RISC CPUs like the first MIPS uarchs would be a lot closer to ARM9 than Cortex-A8. Cortex-A53 probably doesn't have as many interlock conditions as A8 but it should still be substantially worse than ARM9.One particular application I know I'd appreciate having 31 GPRs for is emulating another ISA with 16 GPRs, like x86-64..
blanarahul - Thursday, December 12, 2013 - link
1) Yes. I got confused in the quantity v/s width argument.2) Sorry. I was trying to comment on a topic about which I have little to no knowledge.
ChipNano - Saturday, February 8, 2014 - link
I think 64 bit processor was not relly required, but its just competing with APPLE !!!michael2k - Monday, December 9, 2013 - link
Do you know how computers work?Registers actively store the work in progress of the CPU. Adding two numbers takes three registers. Adding 12 sets of numbers in parallel takes 36 registers.
Increasing your register count 10 fold allows you 10x improvement in performance, assuming 10 available execution units to do work. The way register files work, you can also work on more bits at a time too! Instead of adding 2 ints you can add 20 ints, 10 doubles, or 5 floats at a time.
Your car analogy is entirely baseless here.
xdrol - Monday, December 9, 2013 - link
I'm pretty sure it's you who does not know how do registers work..You don't get 10x performance from 10x registers. You can maybe, if you are very-very lucky, get 1/10th usage of the main memory. That is faster, but even if your program has 50% memory operations (unrealistically high) and 50% other, then you get from 50%+50% -> 5%+50% execution time from the 10x registers, that is 1.8x speedup, all things being super optimal. In exchange, the registers themselves use more power.
You get 10x performance from 10x execution units. That will give you *more than* 10x power (due to how pipelines work) too. Even Haswell has only 7 execution units..
michael2k - Tuesday, December 10, 2013 - link
I explained my post perfectly. 10x registers and 10 available execution units = 10x improvement in performance. I apologize if my hyperbole threw you for a loop, I was trying to explain a concept.10 execution units with only 3 registers = 1 add per clock. 10 execution units with 30 registers = 30 adds per clock. I also hinted at parallel processing; if you have 5 floats that need to be added to 5 floats (or multiplied, or accumulated, or whatever), you can do that in one clock cycle now.
That's all theoretical, to be sure; the reality is that ARMv8 has 32 128bit registers, and is useful for SIMD (single instruction, multiple data) operations: http://www.anandtech.com/show/7335/the-iphone-5s-r...
Anand already covered this. AES saw an 8x improvement, for example, and DGEMM nearly 2x.
Exophase - Tuesday, December 10, 2013 - link
AES saw a huge improvement because AArch64 has instructions specifically to assist AES acceleration which Geekbench is leveraging. If DGEMM uses double precision then it'd have seen a big improvement due to AArch64 adding support for double precision SIMD. The smaller improvements (and one notable regression) in the integer tests could be from the increased register count but possibly also from other factors, like for example if Cyclone is more efficient with conditional select in AArch64 than predication in AArch32.As for register counts, AArch64 actually has 31 64-bit general purpose registers + 1 64-bit stack pointer and 32 128-bit SIMD registers.
Arbee - Tuesday, December 10, 2013 - link
More registers = the compiler can generate better, more efficient code. This is why some software runs up to 20% faster on x64 vs. x86 with just a recompile.As for lower power, listen to some AT podcasts about the "race to sleep" concept. All other things being equal, a phone that finishes a task faster can use less battery.
Wilco1 - Tuesday, December 10, 2013 - link
Actually "race to sleep" uses more power because you are running the CPU at a higher frequency and voltage. It's always better to spread tasks across multiple CPUs and run at a lower frequency and voltage, even if that means it takes longer to complete.WeaselITB - Tuesday, December 10, 2013 - link
Um, no. Not in the slightest. Race to sleep is the best fit we've yet come up with given our current technologies (i.e., constant running power and fixed performance-to-sleep transition requirements).See:
www.cs.berkeley.edu/~krioukov/realityCheck.pdf
www.usenix.org/event/atc11/tech/final_files/LeSueur.pdf
for some examples.
Wilco1 - Tuesday, December 10, 2013 - link
LOL. Not all CPUs are 130W extreme edition i7's on an extremely leaky process!!!A < 5W mobile core on a modern low power process has very little leakage (unlike the i7), so it is always better to scale the clock and voltage down as much as possible to reduce power consumption. Big.LITTLE takes that one step further by moving onto a slower, even more efficient core. Running as fast as possible on the fastest core is only sure way to run your batteries down fast.
michael2k - Tuesday, December 10, 2013 - link
http://www.anandtech.com/show/6330/the-iphone-5-re...Anand has been talking about 'race to sleep' as it applies to mobile CPUs since 2010 now. So, no, it isn't always better to scale the clock down, or it hasn't been in practice.
Wilco1 - Tuesday, December 10, 2013 - link
That link doesn't say anything about "race to idle". Basically if you look at the first graph it shows that the 3 devices have different idle consumption simply due to using different hardware (the newest hardware wins as you'd expect). Anand concludes the device with the lowest idle consumption uses less energy over a long enough timeframe eventhough it may use far more power when being active. True of course, but that has nothing to do with "race to idle".Let me show you a link that explains why running slower uses less energy: http://www.anandtech.com/show/6768/samsung-details...
Look at the right side of the graph "Heterogeneous CPU operation". That shows performance vs the amount of power consumed. As you can see it is not linear at all, and the more performance you require, the more the graph curves to the right (which means less efficient). To paraphrase Anand: "Based on this graph, it looks like it takes more than 3x the power to get 2x the performance of the A7 cluster using the Cortex A15s." So if you did "run to idle" on the A15, you'd use at least 50% more energy to execute the same task on the A7. Of course the A7 runs slower and so returns to idle later than the A15, but it still uses less energy overall.
FwFred - Wednesday, December 11, 2013 - link
This graph ignores the benefit to sleeping on non-core resources... fabrics, IOs, etc. If your internal datapaths are scaled to meet higher performance cores, low powered cores running longer may lead to inefficiencies.It's obviously about balance in SoC design, and efficiency is not as simply as running small cores at low frequencies if you still want to allow scaling to higher performance..
Wilco1 - Wednesday, December 11, 2013 - link
Mobile devices are never really fully sleeping, so while you could power off the screen, you still need to check the touchscreen, keep in contact with the base station, check for incoming calls etc.Yes you definitely want to scale to high performance rather than only using slow cores or low frequencies. That increases power consumption and total energy to perform a given task.
michael2k - Wednesday, December 11, 2013 - link
Yes it does:If we extended the timeline for the iPhone 4 significantly beyond the end of its benchmark run we'd see the 4S eventually come out ahead in battery life as it was able to race to sleep quicker.
Wilco1 - Thursday, December 12, 2013 - link
No. The 4S uses significantly more power than the iPhone 4 when actually running. You can clearly see that the power consumption above the idle level for the 4S is exactly twice that of the 4, but it is only 75% faster. That means it used about 15% more energy to complete the benchmark. So clearly "race to idle" uses more energy than running a bit slower. If you lowered the maximum clock frequency of the 4S then it would become as efficient as the 4.Exophase - Tuesday, December 10, 2013 - link
If race to sleep were always the best solution CPUs wouldn't have DVFS at all. You'd run it at the highest clock always then power gate when idle. But that's not how things are done at all. Dynamic clocking schemes actively try to minimize the amount of time the CPU spends idle, at least until it hits some minimum clock speed.michael2k - Tuesday, December 10, 2013 - link
Who said it was always the best solution? It is just one solution, as is ramping clock, increasing cores, increasing execution units, etc.Wilco1 - Wednesday, December 11, 2013 - link
The point is that "run to idle" is not a solution if you are trying to prolong battery life.Exophase - Wednesday, December 11, 2013 - link
Was said outright, "Race to sleep is the best fit we've yet come up with given our current technologies "melgross - Wednesday, December 18, 2013 - link
That's not even close to being true.xenol - Tuesday, December 10, 2013 - link
More registers are better because its the fastest memory available to the CPU. The less you have, the more it'll be spent waiting on slower memory. Also, RISC architecture has no concept of say "add RAM_location_1 to RAM_location_2", RISC can only move data from RAM to registers and perform operations on registers. Therefore, more registers is pretty much vital on a RISC system.Arbee - Tuesday, December 10, 2013 - link
There are also specific security and ease-of-programming benefits to 64 bit. The Linux kernel used in Android supports address space randomization; it's much easier to pick an unexpected address range when you have 64 bits to play with. And the POSIX mmap() functionality can work on much later files with a 64-bit address space, making it easier to write performant apps that work with large data sets (high MP photos, for instance).janderk - Monday, December 9, 2013 - link
Having 64 bit applications is even a disadvantage when you have little (say 1GB or less) of memory. As mentioned http://www.anandtech.com/show/7460/apple-ipad-air-...">here on Anandtech 64bit apps use 20-30% more memory so 1GB of memory on a 64bit phone/tablet equals to ~0.75 GB memory on a 32bit device. Meaning that you see reloading tabs in browsers and apps occur sooner.janderk - Monday, December 9, 2013 - link
Hmmm seems like I just broke Anandtech. url tags are obviously not allowed. I tried to link to this reviewhttp://www.anandtech.com/show/7460/apple-ipad-air-...
where the 20/30% extra memory usage was mentioned.
phoenix_rizzen - Monday, December 9, 2013 - link
Yes. Go read the review of the iPhone 5S and the Apple A7. There's a pretty significant performance increase from going 64-bit. Even when running 32-bit code.danbob999 - Monday, December 9, 2013 - link
The A7 is much faster than the A6. But most of the improvement is NOT because it's 64 bit.melgross - Wednesday, December 18, 2013 - link
Estimates are that going to 64 bit can bring a 15% increase in performance by itself. There is a multiplication factor to that when you include the other improvements made.But for a number of mathematical calculations, such as those made using encryption, estimates are that there could easily be a 10x improvement. The touch sensor would be benefiting. Perhaps that's why it's so fast.
ws3 - Monday, December 9, 2013 - link
You're not wrong at all. The Snapdragon 410 isn't anything more than a marketing gimmick.fteoath64 - Wednesday, December 11, 2013 - link
Yes and no. Yes in terms of testing the demand for low-end 4-bit chips to allow for easier and faster adoption later. Then hit the market with high-end fast chips when it mushroomed. Secondly to stake in the ground that Qualcomm has a production 64bit Arm chip that it can do amazing 64bit chips later when the demand shoots up in the market. Qualcomm has been able to move fast, so that is a real advantage to competitors like Nvdia who announces and OEMs wait and wait. Then waited somemore ...Krysto - Tuesday, December 10, 2013 - link
Yes.yankeeDDL - Tuesday, December 10, 2013 - link
I think that in first approximation you're right: using 64bit CPU to run 32bit code is substantially useless. It will go a bit faster in some cases, consume a bit more, all things being equal.I think though that the point of 64bit CPU is a mid/long term preparation.
I think it is inevitable to see mobile systems with 4GB or more of memory: already the tablets, especially 10" tablets, could (should?) be used to multitask and to run some relatively intense computational programs. I think what Qualcomm is trying to avoid is what happened int the PC market, where the need for 4GB+ of RAM was there and at that point the vast majority of PCs used 32bit CPUs.
For cellphones, especially the ones in the ranges targeted by the 410, I really don't see any significant benefit, other than making the architecture future-proof and having 64bit-only devices across the board.
psychobriggsy - Tuesday, December 10, 2013 - link
In terms of the iPhone 5S, the 64-bit ARM cores actually are a step up in performance, due to operating system optimisations for the hardware, as well as instruction set changes that improve performance, and larger/more registers (the obvious 64-bit benefit). I believe there was an article on Ars explaining the operating system enhancements that allow iOS7 to take advantage of the 64-bit architecture more than a straight 64-bit port of an OS normally would.Ronald Maas - Wednesday, December 18, 2013 - link
A couple of months ago there was an interesting discussion on realworldtech.com about the benefits of 64-bit. Linus Torvalds (if I remember correctly) mentioned that 64-bit was already beneficial when physical memory exceeds 896 MB. Reason was that above 896 MB, it is not possible to address all physical memory from 32-bit kernel virtual memory space (which is limited to 1 GB). At that point managing memory becomes significantly less efficient because of the need to frequently remap virtual memory space to different chunks of physical memory.Unfortunately was not able to locate the thread anymore.
MadMan007 - Monday, December 9, 2013 - link
Anand, if you would truly "personally much rather see two higher clocked A53s", and if that applies to quad core versus dual core in general, you should start using your influence through this site and with direct industry connections to put that out there. I don't recall reading anything significant talking down quad cores versus dual cores in your articles when it may have been applicable (not Apple because they don't have a quad core anyway), and if it was even mentioned it was so minor that it was easy to overlook.Anand Lal Shimpi - Monday, December 9, 2013 - link
I've been doing this for the past year (in fact I literally just did this last week). I am going to start campaigning for this more aggressively though. It'll take a while before we see any impact given how long it takes to see these things come to fruition though.Take care,
Anand
extide - Monday, December 9, 2013 - link
Why not campaign for better dynamic clock control/turbo/etc. To me, that seems like the best solution going forward. For example, back in the day of the Core 2 Duo vs Core 2 Quad, you had to make the compromise, max clock (Dual) or Max Multi Thread Perf (Quad). Nowadays with turbo mode and whatnot you can essentially have the best of both worlds. A quadcore chip that shuts down 2 cores and can run as a fast dualcore if needed.For a lot of reasons I think it will be very difficult convincing people, oems, etc that moving from 4 cores to 2 cores is a good thing, and in some ways it really IS moving backwards. (You are still making that same compromise)
Anand Lal Shimpi - Monday, December 9, 2013 - link
Also another part of what I (and a few others) have been campaigning for over the past year as well. Power management and opportunistic turbo is still largely a mess in mobile. Thankfully there are improvements coming along this vector.FwFred - Monday, December 9, 2013 - link
Extending the A53 pipeline length to go higher frequencies would seem to go against the big.LITTLE scheme. ARM probably gets non-trivial benefits by keeping the pipeline on the short side.I'd like to know if A57 is going to allow sustained usage in <=5" phones without having to kick over to the A53. Perhaps only for 1-2 thread usages?
Wilco1 - Monday, December 9, 2013 - link
Agreed, a longer pipeline will reduce power efficiency considerably, making it less suitable for big.LITTLE. However further frequency gains are likely, the ARM website says 2GHz is expected (no mention of process, so I guess on 16nm in 2015).We already have 1.8GHz quad-core A15 phones today. ARM claims that A57 is actually more power efficient, so I don't see why there would be an issue with using A57 in a phone besides the fact that 64-bit seems a bit unnecessary. The 20nm process will be used next year as well, improving power and performance further.
Wilco1 - Monday, December 9, 2013 - link
Dual vs quad is a lost cause already, especially since we're moving towards 8 cores (4+4 in big.LITTLE). The die-size cost is low enough that the performance gain in the cases where you can use all CPUs is worth it.psychobriggsy - Tuesday, December 10, 2013 - link
Very true, I don't know the die size of a 28nm A53 core (and the L2 cache it will need), but on a modern ~60mm^2 "economy SoC" there's probably not much difference between two and four cores (<5 mm^2).Wilco1 - Tuesday, December 10, 2013 - link
The difference between 2 and 4 Cortex-A7 cores is less than 1mm^2 on 28nm, so why go for dual core when quad core is almost free? A53 will be a bit larger than A7 of course, but the same will happen.I don't understand why Anand is against quad cores. If you only use 1 or 2 cores, then having a quad core doesn't cost you anything (not in power nor frequency). If you can use more than 2 cores you get the benefit of having a quad core. So you never lose out.
errorr - Tuesday, December 10, 2013 - link
The problem is China. If you want to eventually sell your phone there it needs to be quad core for some reason. The marketing push for quad core chips was too effective and the market was too big. If Qualcomm wants to eventually compete in asia at least.I think I read about it at techpinions or maybe I was listening to Ben Thomson.
psychobriggsy - Tuesday, December 10, 2013 - link
Certainly it appears that current Android apps rarely use more than two (meaty) threads, so it is common to see two cores out of four sleeping most of the time in quad-core SoCs.It would be nice to see some form of turbo in these quad-cores. E.g., 4xA53 @ 1.2GHz, [email protected], [email protected]. It would certainly help in those single threaded Javascript benchmarks that everyone uses to test single threaded performance as if its even meaningful unless the systems are running the same OS, browser.
Someguyperson - Monday, December 9, 2013 - link
Just because Anand is calling it out doesn't mean that Qualcomm disagrees. The market for this processor (the Asian market) demands more cores because the OEMs want to market as many cores as possible. It doesn't matter if they are slow or not (I think I saw someone wants to make an 8 core Cortex-A7 chip), that's what the OEMs want.Being able to market the first 64-bit Android devices will be a huge selling point even though the devices won't have more than 1 GB of memory and the users will never use encryption because that's how the market works over there. Nobody realizes that Intel makes dual core parts that are several time faster than these.
beginner99 - Tuesday, December 10, 2013 - link
Yeah, Asians are weird. I recently read that many Koreans belief you die if you run a fan in a closed room. This includes people like physics students. WTF? Back on topic, Quad-cores make no sense in phones. I admit, mine has a quad too. For once I have to applaud to Apple (I actually "hate" Apple). Their A6 and A7 are very well thought-out designs. However that's not enough to charge 2-3 times the price to comparable phones.jasonelmore - Tuesday, December 10, 2013 - link
agree'd, however apple tax is much greater on their tablets. Ex: a decent usable ipad mini w/retina will set you back $630 + Tax (32GB LTE Model) and this is a small 8" tablet.we need these tablets to come down quite a bit, even $500 for the above mentioned specs is expensive.
extide - Tuesday, December 10, 2013 - link
Heh, wouldn't everyone be so happy if we could get a flagship android phone with an Apple A7 SOC in it? I would!blanarahul - Monday, December 9, 2013 - link
"although I'd personally much rather see two higher clocked A53s."This is partially the reason I bought the Xperia L with it's dual core krait v/s the 1 trillion mediatek based phones.
The main reason was Adreno 305 >>> SGX 544MP1 btw.
Klug4Pres - Tuesday, December 10, 2013 - link
Yup, and why I bought my Mum the Xperia SP, which is dual core Krait and Adreno 320.PC Perv - Monday, December 9, 2013 - link
Why would you go backwards unless there is a clear sign of disadvantages? Would you say the same about memory, storage, or screen resolution? It makes no sense when these days web browsers can make use of as many cores as available. (and they have been doing so for some time)Try scrolling the HOME PAGES on iPad mini (2013) with a few Safari tabs open. Tell me what it looks and feels like. I thought I was dealing with Touchwiz.
beginner99 - Wednesday, December 11, 2013 - link
Web page rendering isn't multi-threaded at all not even on desktop browsers. I would say most stuff on smartphones isn't. But that does not matter. Even on a desktop 2 fast cores will always be better than 4 slow ones.Howver with Intel turbo in quads that is not an issue anymore but it still is in phones (speak ARM).kris. - Monday, December 9, 2013 - link
Lol...mediatek octa-core is nowherejasonelmore - Monday, December 9, 2013 - link
How far is 64 bit Android? I mean, KitKat was just released, so are we looking at a year at minimum? or will Google announce some uber dark project x64 they've been working on.jjj - Monday, December 9, 2013 - link
Makes no sense to want higher clocks for A53 , we would be better off with 2+2 A57 and A53 or even 1+2..Would be nice to see some very thin devices , bellow 5mm , with 4xA53 clocked low.- ofc this one is on 28nm so we can hope for more soon when 20nm parts show up.
iwod - Monday, December 9, 2013 - link
Does anyone know if Qualcomm license out their 9xx5 core to other companies for integration?Krysto - Tuesday, December 10, 2013 - link
I'm with Qualcomm adopting this, as I thought they would, since they were already using A7 in S400, but this is rather embarrassing for Qualcomm.I mean how the hell is Qualcomm's first ARMv8 chip an ARM one?! I thought the whole point of licensing the architecture and building the core yourself was that you got to release it EARLIER than ARM themselves. I expected Nvidia to not release an ARMv8 chip in 2014, but Qualcomm seems to have handled their transition to ARMv8 just as poorly.
In the end, this is good, though. It just means Qualcomm gets to have less monopoly, and perhaps for once Samsung will do something with their chips, and try to steal customers away from Qualcomm, by giving their chips to others. I don't know what the hell the chip business of Samsung is thinking. They had so many opportunities to push Exynos chips as competitors to Qualcomm's chips, especially since Apple wants to give up on having them make their chips, and they never took advantage of them. They even have their own fab. Heck, they aren't even using them for their own chips, let alone sell them to others. If Samsung will be the first with a 64-bit chip in 2014, this would be a good opportunity for them to start doing that.
darkich - Tuesday, December 10, 2013 - link
I think that Qualcomm had a good reason to develop its own core with the Krait(Cortex A15 having problematic power efficiency), but now with the A53/A57, I wouldn't be surprised if we don't see the successor of Krait core actually.And using stock ARM won't make them any less competitive.. Qc is the market leader in the whole SoC design, and that has always been their greatest strength. The CPU architecture is just a small part of their puzzle.
Kvaern - Tuesday, December 10, 2013 - link
Because no one expected the Spanish Inquisition, or Apple sending the market into a 64 bit frenzy in 2013.Qualcomm no doubt has a 64 bit krait roadmap but Apple destroyed that and now they are playing catch up the best they can, and that apparently means they have to rely on an ARM design for now.
michael2k - Tuesday, December 10, 2013 - link
You're expecting too much. Had Apple not released their 64 bit A7 then neither NVIDIA nor Qualcomm (or Samsung) would have been 'late'.You also over estimate Samsung. Samsung's initial chip designs were via Intrinsity, which Apple bought out from under them. Without that they have to rely on ARM and their own in-house talent, which has demonstrably been experiencing growing pains as they try to DIY. Likewise Apple has never 'given up on them', though they may be shopping around. Apple still purchases the bulk of their SoC from Samsung, which deprives Samsung of the capacity necessary to fulfill their own orders, much less anyone else's. In that light this is why Samsung has and continues to use third party chips like Snapdragon and Tegra... and third party fabs like TSMC!
So relax; it doesn't matter who is second with a 64 bit chip, it only matters that you have sufficient competition to get a phone/tablet you like.
Mondozai - Tuesday, December 10, 2013 - link
It does matter who is #2, because it influences the Android market, and thus competition. Also, both Qualcomm and Samsung are late. That's just a fact.Krysto is right that Qualcomm missed the 64-bit train, which is why they demoted their CMO when he started to trash Apple's A7. If they are forced to use ARM designs for their first 64-bit chip it is essentially an admission of failure to predict where the market would be. (If you actually need 64-bit SoC's in phones for 2014 is up for debate, I think it's highly debatable.
But the fact is that the market has moved here in preparation for future phones and tablets and Qualcomm is late. Simple as. But it's not going to be very serious. If they can get their Krait 64-bit before the year is over or very early next year, not much is lost. But still; embarrassing to be so flatfooted and kudos to Apple for driving the ecosystem(once again).
michael2k - Tuesday, December 10, 2013 - link
No, Apple was unexpectedly early. Everyone had a published or announced roadmap indicating 64 bit would happen by the end of 2014; none of that has changed.Kvaern - Tuesday, December 10, 2013 - link
Maybe it's my google skills but I can't find anything resembling a Qualcomm or Samsung roadmap talking about 64 bit dates prior to the A7 announcement and as you know qualcomm's cmo immediately wrote 64 bit off as a "gimmick" which does imply a thing or two about Qualcomm's priorities at the time of the announcement.
michael2k - Tuesday, December 10, 2013 - link
http://www.anandtech.com/show/6420/arms-cortex-a57...ARM announced the 64 bit CPUs in 10/12, and would deliver their base designs by about 6/13, with the expectation of production silicon by spring of 2014. AMD had announced it's product by 2014.
ARM's press release: http://www.arm.com/about/newsroom/arm-launches-cor...
ARM itself expected the first Cortex A50 based chips in 2014.
Again, Apple shipping in 2013 was unexpectedly early.
Kvaern - Tuesday, December 10, 2013 - link
That's only ARM itself, not "everyone".The 410 is simply a low end reference design if I'm not mistaken, which they can get to market much faster than a 64 bit Krait and there's no Qualcomm published roadmap with a 64 bit Krait in it as far as I'm aware.
michael2k - Wednesday, December 11, 2013 - link
That's the irony isn't it?Apple beat ARM to market with a custom design, while Qualcomm is relying on a reference design to be first to market. Yes, Qualcomm never published a roadmap, but ARM did. I just assumed that everyone would release their parts after ARM did.
fteoath64 - Thursday, December 12, 2013 - link
I am sure Qualcomm has a Krait 64bit design but it might be late, hence the use of A53 reference design as the way to "beat the 2013" time frame in order to show the market it has some 64-bit cred. This buys time for a real Krait 64 sometime Q2 or next year when it will begin to heat up as the other Arm players begin to show their iterations of the reference A57 designs. Here is where Krait can still work its magic to capture the bulk of the market the way it did with S600.OreoCookie - Wednesday, December 11, 2013 - link
There is no pressure in the Android world to switch to 64 bit earlier (Android would still run using the ARM v7s ISA). Apple, on the other hand, does have advantages switching to 64 bit on its own terms in addition to speedups. For instance, I expect Apple's 64 bit transition for their ARM-powered devices to be very smooth sailing: by the time their devices hit the 4 GB barrier, all of iOS ecosystem has moved to 64 bit long ago.In the Android world having a »64 bit cpu« is just a marketing checkbox at this point. However, he advantages of A53 (low power consumption, higher performance) remain even if it is run in 32 bit mode.
SydneyBlue120d - Tuesday, December 10, 2013 - link
Can't wait to preorder the Google Nexus 6 running 64bit Krait evolution with 4K HDR HEVC 60FPS IOS video support and LTE Advanced :Ptigmd99 - Tuesday, December 10, 2013 - link
Anand, how does A53 compare to Cyclone (from Apple A7 chip)?ws3 - Tuesday, December 10, 2013 - link
http://www.sadtrombone.com/endinyal - Tuesday, December 17, 2013 - link
Well thank goodness that everyone that "thinks" they know what's going on said that 64-bit on a phone is a waste right??? Last thing we want is for the fandroid community to once-again be proven wrong right?melgross - Wednesday, December 18, 2013 - link
Never pay attention to people saying that something isn't necessary. They're always wrong.dynamited - Thursday, December 19, 2013 - link
"I'm really excited to see what ARM's Cortex A53 can do." Then we have a marketing table from Qualcomm with projected numbers presented as fact. I'm holding my breathe all next year in anticipation.ChipNano - Saturday, February 8, 2014 - link
It would be quite challenging for the design and testing of this 64 bit snapdragon 410 since its first 64 bit processor that too with A53..Krysto - Friday, May 30, 2014 - link
If Cortex A53 is 43 percent faster than Cortex A7 at same clock speed, then Qualcomm's Snapdragon 410 at 1.4 Ghz should be 65 percent faster than Snapdragon 400. So Moto G2 could be that much faster than the original. Not bad improvement in 12 months time.The GPU, however, seems a little disappointing? I'm not sure how much faster it is compared to the one in Snapdragon 400, but overall it should be less than Adreno 225/Adreno 305? I hope I'm wrong. I'd expect to see Adreno 320 level performance in such devices in late 2014/early 2015.