Not really true. It is growing quicker than x86 and therefore gaining market share, no matter what you think about it. It of course always depends, but if your use case is not highly reliant on the absolut newest and fastest vector extensions, most Neoverse based ARM chips are more efficient at giving you performance, which is ultimately what matters for most cloud/ecommerce/webserver like use cases. I doubt you've had a chance to actually test Nvidia Grace, since nobody outside Nvidia has, so we'll have to see there. It is of course possible to design an ARM chip with even more vector focus. Probably something more the Japanese providers would be doing for HPC though. As Apple has shown even maximum single core performance is possible, it's just not what scale out optimized Linux webserver style deployments actually need.
I'll tell you a secret, these companies making ARM chips trying to venture into the server market aren't even making a profit to pay back their investors. ARM has its uses, but it can't compete with x86 in the general market.
It's easy to grow faster than x86 when you are 0% of the market. Almost all of that growth has been driven by AWS, which has committed to ARM more than anyone else. Depending on whom you ask, it's around 15%-20% of AWS's server fleet. The rest are just testing the waters. It does fill a need for I/O bound workloads, networking/storage. The 'good enough performance' sweet spot at decent power. For stronger single threaded performance, x86 still wins. AMD bergamo is an attempt at addressing that sweet spot of high throughput at lower TDP. Intel Sierra Forest is coming soon to attack that market. Ultimately, TCO matters: if the x86 vendors deliver more TCO, they win. If not, ARM wins.
The advantage nVidia has over AMD is footprint. You can have far more CUDA-capable SoC's in the same amount of space as EPYC offers, while the TDP's are negligible (though a relevant consideration at scale, regardless of density advantage.)
What I don't get is how investors are only blowing up nVidia stock while AMD has a competitive product.
Or maybe the question is whether the world needs a fabless x86 chip design company at all? Intel always made x86 competitive by virtue of superior manufacturing. When intel badly stumbled, allowing TSMC to take the manufacturing lead, there was a window for AMD to beat everyone with high performance designs on TSMC’s process.
But now we are in a world where apple has shown high performance ARM designs are possible, at lower power, and the big firms are all designing their own. Meanwhile, Intel might actually regain the process lead. So why buy x86 from AMD when you could design your own ARM chip on the same process or buy a better x86 from Intel? Or pay intel to fab your ARM design?
That isnt completely true. Intel's ability to be competitive with TSMC will improve massively. Part of the problem is that they were an internal division, so they really didn't have to operate in a lean way. I'm certain Intel will sort all of that out and the fight will be more a technological one.
No, Intel was superior because it had volume. Volume in client (PC) market recovered the NRE cost of CPU design, allowing the same design to be sold in server at far lower cost than server-only players (sun, dec, ibm etc). Binning for yield and performance requires volume. This is still true today. A mask set for a design on sub 3nm costs double-digit millions of dollars. Developing the IP that turns into that mask is an order of magnitude more expensive. Then there’s fab/packaging/test costs that come down with volume. Like intel, amd has volume. Arm design efforts are distributed across several vendors, each of whom is paying this NRE cost independently. For cloud vendors, the additional cost may be offset by not having to pay intel/amd profit margins, and may help negotiate better deals for the x86 parts they do procure. But even for them, volume matters or these efforts will get killed, esp considering the new focus on efficiency at big tech
Recovering the cost of the design is *trivial* compared to recovering the cost of developing the manufacturing process and building/equipping the fabs. So I agree that volume was key, but that volume was key because it enabled the economies of scale necessary to support the R&D and capital costs of manufacturing. Intel has rarely had the best designs, but up until the mid-late 2010s, they had the best manufacturing process, and that almost always help them overcome design deficiencies.
When it comes to the volume needed to sustain manufacturing R&D and capital costs, AMD falls far short. That's why AMD doesn't own fabs anymore. That worked out ok for AMD for a while because they paired a better x86 design with TSMC's manufacturing to beat weak Intel designs on weak manufacturing. But Intel has improved designs and is on the cusp of pulling ahead in manufacturing. If Intel actually pulls it off, I think AMD is toast.
TSMC has the volume in spades, thanks largely to Apple. Samsung and Intel are barely hanging in there. Part of the reason Intel is hanging in there is US government support. But if Intel can parlay the combo of traditional x86, AI, and foundry business into sufficient volume, then they can become self-sustaining and stay in the lead on process once again.
AMD has had ARM designs for servers before and could do so again. They're not married to x86.
Also, part (by no means all) of the Apple success with high performance ARM designs depends on the integration of memory and CPU on the same chip, something which won't be a mainstream answer in servers.
Don't get me wrong, I've seen what even the mid-range M3s can do and been slightly terrified. (Running on a gaming notebook with an NVidia GPU and top end AMD CPU vs. an M3 on a colleague's work laptop, he got probably 8x the performance on running an LLM locally? anecdotal and I'm hazy on whether my machine was making full use of the GPU for the LLM but still.)
But AMD has plenty of other assets besides the x86 license and will follow the market where it goes.
I'm not too clued up on the ARM power-performance metric at present, but the fact that everyone is doing it doesn't mean it's the way to go. If one had to go outside x86, RISC-V seems the better path than ARM.
ARM is a very mature and widely adopted ISA. It has dominated mobile sales while any other architectures have failed. Apple has proven for years the architecture can outperform all the way up to the workstation level. That doesn't make it "bandwagon thinking" it makes it "using the tool evaluated to be best for the job".
will that carry up into the server space? We'll see.
Good points. It is an excellent tool for the job and the most widely used, thanks to mobile. But is it the best for the job, available right now? It seems in the public discourse that x86 is finished, but is it really that certain and clear, from a performance and power point of view? (The cost of decoding isn't unique to x86.) And there's something to be said about an open standard like RISC-V.
Should be 128 Zen 5 cores for Turin Classic, or 192 Zen 5c cores for Turin Dense.
For me, Zen 5 is interesting mostly for the consumer mobile variants from Strix Halo down to Sonoma Valley, and Zen 6 is worth a look everywhere if that's where the fundamental changes end up landing.
Pity current node shrinking became so tricky and giving less and less advantages... The 128 cores from current 96 for Zen 5 is around only 30% increase. In ideal world of scaling laws this sounds like switch from 5nm to 4 nm not to 3 nm because (5 / 3)^2 = 2.78 allows for more than doubling amount of cores while 4nm is the factor of (5 / 4)^2 = 1.56.
All of Zen 4 Epyc (using both Zen 4 and Zen 4c chiplets) is on TSMC N5. The 128-core Zen 5 Epyc is probably using N4X, which is part of the same 5nm family of nodes. The 192-core Zen 5c version is probably using N3.
Based on this, there is barely any area scaling between N5 and N4X, but a decent jump when moving to N3.
Sure, it's sad that the free lunch is running low for the moment, but I think 3D will revitalize everything once the industry figures out how to stack hundreds of layers of cores and memory without melting the chips in 10-20 years. And in the more immediate future, stacked cache is addressing the SRAM scaling problem very well. Before everything goes 3D, we'll probably see at least 3 "normal" node shrinks, the introduction of backside power delivery, etc.
According that article N3 even in 2022 was expecting to reduce power by 25-30% and almost doubling areal density (and slight increase of performance per core which is ok even if this increase would be zero). Given 360W power consumption of 96core N5p doubling amount of cores should bring AMD 500W TDP of N3p 192 core chip. It is reasonable to expect that current motherboards will sustain extra power with extra forced cooling. My current one from Gigabyte keeps such power OK, while water cooling keeps processor below 60C when run on peak power. You just need to install 3-4 small 1-2W forced blowers directed exactly on the inverter's heat radiators (by some reason they are inexcusably small). I would not complain if AMD delivered 1kW TDP and decent count of cores like 500. We will go there anyway
"C" cores use some silicon tricks and lower clock speeds to keep their efficiency up. At least, the current dense "C" cores for servers do, there's talk of new dense-but-fast-at-any-cost cores possibly coming to client but that's a story for another day.
You get the stated power reduction at the same performance (clocks), not both at once. Which is fine. And of course the analysis can get more complicated than that because there is an entire power curve and other factors.
AMD is already doing 128 Zen 4c cores (Bergamo) at a 360W TDP. Just plugging in the numbers like you did: (360 * 1.5 * (0.7|0.75)) = 380-405 W for 192 cores on N3. That would be a more reasonable 6-13% increase, and not far off from what is already being done on the SP5 socket (9684X is 400W).
Epyc is not for me. I'd like to see AMD make AM6 physically larger and support quad-channel memory, for higher core counts, bigger APUs, more heat dissipation, etc. Basically delete Threadripper and split the difference by offering more prosumer capabilities and flexibility on the consumer platform, and pushing others to use Epyc. Just the move to quad-channel memory would be great for APU-only systems.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
41 Comments
Back to Article
name99 - Wednesday, May 1, 2024 - link
"increase its datacenter market share with its EPYC Turin processors"Or increase its x86 datacenter market share?
That's the question, isn't it...
AMZ, Google, Meta, MS, nVidia Grace, Ampere, ...
Dante Verizon - Wednesday, May 1, 2024 - link
ARM is of limited use.Even Nvidia's gigantic chip suffers against EPYC
autarchprinceps - Thursday, May 2, 2024 - link
Not really true. It is growing quicker than x86 and therefore gaining market share, no matter what you think about it. It of course always depends, but if your use case is not highly reliant on the absolut newest and fastest vector extensions, most Neoverse based ARM chips are more efficient at giving you performance, which is ultimately what matters for most cloud/ecommerce/webserver like use cases. I doubt you've had a chance to actually test Nvidia Grace, since nobody outside Nvidia has, so we'll have to see there. It is of course possible to design an ARM chip with even more vector focus. Probably something more the Japanese providers would be doing for HPC though. As Apple has shown even maximum single core performance is possible, it's just not what scale out optimized Linux webserver style deployments actually need.ET - Thursday, May 2, 2024 - link
Just wanted to say that quite a few people outside of NVIDIA have tested Grace. I tested Grace Hopper, and I don't have any special connections.Dante Verizon - Thursday, May 2, 2024 - link
I'll tell you a secret, these companies making ARM chips trying to venture into the server market aren't even making a profit to pay back their investors. ARM has its uses, but it can't compete with x86 in the general market.https://www.phoronix.com/review/nvidia-gh200-amd-t...
deltaFx2 - Thursday, May 2, 2024 - link
It's easy to grow faster than x86 when you are 0% of the market. Almost all of that growth has been driven by AWS, which has committed to ARM more than anyone else. Depending on whom you ask, it's around 15%-20% of AWS's server fleet. The rest are just testing the waters. It does fill a need for I/O bound workloads, networking/storage. The 'good enough performance' sweet spot at decent power. For stronger single threaded performance, x86 still wins. AMD bergamo is an attempt at addressing that sweet spot of high throughput at lower TDP. Intel Sierra Forest is coming soon to attack that market. Ultimately, TCO matters: if the x86 vendors deliver more TCO, they win. If not, ARM wins.Samus - Tuesday, May 7, 2024 - link
The advantage nVidia has over AMD is footprint. You can have far more CUDA-capable SoC's in the same amount of space as EPYC offers, while the TDP's are negligible (though a relevant consideration at scale, regardless of density advantage.)What I don't get is how investors are only blowing up nVidia stock while AMD has a competitive product.
Blastdoor - Thursday, May 2, 2024 - link
Or maybe the question is whether the world needs a fabless x86 chip design company at all? Intel always made x86 competitive by virtue of superior manufacturing. When intel badly stumbled, allowing TSMC to take the manufacturing lead, there was a window for AMD to beat everyone with high performance designs on TSMC’s process.But now we are in a world where apple has shown high performance ARM designs are possible, at lower power, and the big firms are all designing their own. Meanwhile, Intel might actually regain the process lead. So why buy x86 from AMD when you could design your own ARM chip on the same process or buy a better x86 from Intel? Or pay intel to fab your ARM design?
Maybe real men own fabs after all.
ET - Thursday, May 2, 2024 - link
An actual CPU you can use is infinitely more performant than one of paper.Dante Verizon - Thursday, May 2, 2024 - link
Do you have the nerve to say that with intel sinking into debt it can't be profitable or competitive with TSMC? lolFreckledTrout - Monday, May 6, 2024 - link
That isnt completely true. Intel's ability to be competitive with TSMC will improve massively. Part of the problem is that they were an internal division, so they really didn't have to operate in a lean way. I'm certain Intel will sort all of that out and the fight will be more a technological one.GeoffreyA - Thursday, May 2, 2024 - link
So, owning fabs is some sort of p***s-measuring contest?FreckledTrout - Monday, May 6, 2024 - link
Yes. Considering there are exactly three fab companies in the world that can make these cutting-edge chips.Dante Verizon - Tuesday, May 7, 2024 - link
One*deltaFx2 - Thursday, May 2, 2024 - link
No, Intel was superior because it had volume. Volume in client (PC) market recovered the NRE cost of CPU design, allowing the same design to be sold in server at far lower cost than server-only players (sun, dec, ibm etc). Binning for yield and performance requires volume. This is still true today. A mask set for a design on sub 3nm costs double-digit millions of dollars. Developing the IP that turns into that mask is an order of magnitude more expensive. Then there’s fab/packaging/test costs that come down with volume.Like intel, amd has volume. Arm design efforts are distributed across several vendors, each of whom is paying this NRE cost independently. For cloud vendors, the additional cost may be offset by not having to pay intel/amd profit margins, and may help negotiate better deals for the x86 parts they do procure. But even for them, volume matters or these efforts will get killed, esp considering the new focus on efficiency at big tech
Blastdoor - Tuesday, May 7, 2024 - link
Recovering the cost of the design is *trivial* compared to recovering the cost of developing the manufacturing process and building/equipping the fabs. So I agree that volume was key, but that volume was key because it enabled the economies of scale necessary to support the R&D and capital costs of manufacturing. Intel has rarely had the best designs, but up until the mid-late 2010s, they had the best manufacturing process, and that almost always help them overcome design deficiencies.When it comes to the volume needed to sustain manufacturing R&D and capital costs, AMD falls far short. That's why AMD doesn't own fabs anymore. That worked out ok for AMD for a while because they paired a better x86 design with TSMC's manufacturing to beat weak Intel designs on weak manufacturing. But Intel has improved designs and is on the cusp of pulling ahead in manufacturing. If Intel actually pulls it off, I think AMD is toast.
TSMC has the volume in spades, thanks largely to Apple.
Samsung and Intel are barely hanging in there. Part of the reason Intel is hanging in there is US government support. But if Intel can parlay the combo of traditional x86, AI, and foundry business into sufficient volume, then they can become self-sustaining and stay in the lead on process once again.
Drivebyguy - Thursday, May 2, 2024 - link
AMD has had ARM designs for servers before and could do so again. They're not married to x86.Also, part (by no means all) of the Apple success with high performance ARM designs depends on the integration of memory and CPU on the same chip, something which won't be a mainstream answer in servers.
Don't get me wrong, I've seen what even the mid-range M3s can do and been slightly terrified. (Running on a gaming notebook with an NVidia GPU and top end AMD CPU vs. an M3 on a colleague's work laptop, he got probably 8x the performance on running an LLM locally? anecdotal and I'm hazy on whether my machine was making full use of the GPU for the LLM but still.)
But AMD has plenty of other assets besides the x86 license and will follow the market where it goes.
GeoffreyA - Friday, May 3, 2024 - link
The M3 has a Neural Engine, from which the gains are likely coming.meacupla - Friday, May 3, 2024 - link
M3 NPU is like 18TOPS across all the variants.It's on the slower side, and even beaten by it's own A17, which does 35TOPS.
GeoffreyA - Friday, May 3, 2024 - link
The memory bandwidth then?meacupla - Friday, May 3, 2024 - link
I think M3 has bad NPU performance because Apple did the most Apple thing possible with emerging technology.That and Tim Cook has poor foresight.
GeoffreyA - Thursday, May 2, 2024 - link
I'm not too clued up on the ARM power-performance metric at present, but the fact that everyone is doing it doesn't mean it's the way to go. If one had to go outside x86, RISC-V seems the better path than ARM.EasyListening - Thursday, May 2, 2024 - link
I'm putting my money on Jim Keller over at Tenstorrent who is developing RISC-V productsGeoffreyA - Friday, May 3, 2024 - link
Exactly. And an open ISA too. I don't fully understand this whole ARM fixation except bandwagon thinking.grant3 - Monday, May 6, 2024 - link
ARM is a very mature and widely adopted ISA. It has dominated mobile sales while any other architectures have failed. Apple has proven for years the architecture can outperform all the way up to the workstation level. That doesn't make it "bandwagon thinking" it makes it "using the tool evaluated to be best for the job".will that carry up into the server space? We'll see.
GeoffreyA - Tuesday, May 7, 2024 - link
Good points. It is an excellent tool for the job and the most widely used, thanks to mobile. But is it the best for the job, available right now? It seems in the public discourse that x86 is finished, but is it really that certain and clear, from a performance and power point of view? (The cost of decoding isn't unique to x86.) And there's something to be said about an open standard like RISC-V.https://chipsandcheese.com/2021/07/13/arm-or-x86-i...
https://chipsandcheese.com/2024/03/27/why-x86-does...
grant3 - Monday, May 6, 2024 - link
Better path for whom?Are you saying you expect RISC-V processors to automatically have better power-performance than ARM processors? Why?
GeoffreyA - Tuesday, May 7, 2024 - link
Being an open standard is its advantage, not performance and power.ballsystemlord - Thursday, May 2, 2024 - link
So when AMD's first EPYC CPU launched, they talked about a path to 10% market share compared to Intel. What are they at now?meacupla - Thursday, May 2, 2024 - link
Currently AMD server market share is over 20%, not quite at 25%, but also growing rapidly.ballsystemlord - Thursday, May 2, 2024 - link
Thanks!Dante Verizon - Thursday, May 2, 2024 - link
25-30%del42sa - Thursday, May 2, 2024 - link
"Looking ahead, we are very excited about our next-gen"they being excited every each generation product launch include RDNA3 ....
Makaveli - Thursday, May 2, 2024 - link
What company isn't excited about their next gen products?SanX - Thursday, May 2, 2024 - link
Couple months ago I have heard about 196 cores in Turin. Any news here?SanX - Thursday, May 2, 2024 - link
Sorry, 192nandnandnand - Thursday, May 2, 2024 - link
Should be 128 Zen 5 cores for Turin Classic, or 192 Zen 5c cores for Turin Dense.For me, Zen 5 is interesting mostly for the consumer mobile variants from Strix Halo down to Sonoma Valley, and Zen 6 is worth a look everywhere if that's where the fundamental changes end up landing.
SanX - Friday, May 3, 2024 - link
Pity current node shrinking became so tricky and giving less and less advantages... The 128 cores from current 96 for Zen 5 is around only 30% increase. In ideal world of scaling laws this sounds like switch from 5nm to 4 nm not to 3 nm because (5 / 3)^2 = 2.78 allows for more than doubling amount of cores while 4nm is the factor of (5 / 4)^2 = 1.56.nandnandnand - Friday, May 3, 2024 - link
All of Zen 4 Epyc (using both Zen 4 and Zen 4c chiplets) is on TSMC N5. The 128-core Zen 5 Epyc is probably using N4X, which is part of the same 5nm family of nodes. The 192-core Zen 5c version is probably using N3.https://www.anandtech.com/show/18875/tsmc-details-...
Based on this, there is barely any area scaling between N5 and N4X, but a decent jump when moving to N3.
Sure, it's sad that the free lunch is running low for the moment, but I think 3D will revitalize everything once the industry figures out how to stack hundreds of layers of cores and memory without melting the chips in 10-20 years. And in the more immediate future, stacked cache is addressing the SRAM scaling problem very well. Before everything goes 3D, we'll probably see at least 3 "normal" node shrinks, the introduction of backside power delivery, etc.
SanX - Friday, May 3, 2024 - link
According that article N3 even in 2022 was expecting to reduce power by 25-30% and almost doubling areal density (and slight increase of performance per core which is ok even if this increase would be zero). Given 360W power consumption of 96core N5p doubling amount of cores should bring AMD 500W TDP of N3p 192 core chip. It is reasonable to expect that current motherboards will sustain extra power with extra forced cooling. My current one from Gigabyte keeps such power OK, while water cooling keeps processor below 60C when run on peak power. You just need to install 3-4 small 1-2W forced blowers directed exactly on the inverter's heat radiators (by some reason they are inexcusably small). I would not complain if AMD delivered 1kW TDP and decent count of cores like 500. We will go there anywaynandnandnand - Friday, May 3, 2024 - link
"C" cores use some silicon tricks and lower clock speeds to keep their efficiency up. At least, the current dense "C" cores for servers do, there's talk of new dense-but-fast-at-any-cost cores possibly coming to client but that's a story for another day.You get the stated power reduction at the same performance (clocks), not both at once. Which is fine. And of course the analysis can get more complicated than that because there is an entire power curve and other factors.
AMD is already doing 128 Zen 4c cores (Bergamo) at a 360W TDP. Just plugging in the numbers like you did: (360 * 1.5 * (0.7|0.75)) = 380-405 W for 192 cores on N3. That would be a more reasonable 6-13% increase, and not far off from what is already being done on the SP5 socket (9684X is 400W).
Epyc is not for me. I'd like to see AMD make AM6 physically larger and support quad-channel memory, for higher core counts, bigger APUs, more heat dissipation, etc. Basically delete Threadripper and split the difference by offering more prosumer capabilities and flexibility on the consumer platform, and pushing others to use Epyc. Just the move to quad-channel memory would be great for APU-only systems.