PowerPC literally doesn't exist anymore and hasn't for like over a decade... You're thinking of IBM POWER, which is already decently competitive for Big Iron.
I don't see how they'll be able to repeat the performance or efficiency leaps they achieved with A64FX. It was the first out of the gate with SVE, but now others are doing it. And their hard-wired AI accelerator will be challenging industry players with several generations under their belts.
They have one possible advantage on efficiency, which is that if you have a sufficiently large budget, you can add more nodes and clock them lower. If GPU-based supercomputers wanted better efficiency numbers, that's all they'd have to do. However, budget constraints mean they have to run a smaller number of GPUs well outside their optimal efficiency range.
Since Fugaku was designed in the post-Fukushima era, energy was probably very expensive. That could've pushed them to budget more on equipment, for the benefit of lower operating costs.
HPC customers like really wide SIMD, if they go for that again. Other SVE2 implementations (other than SiPearl's mysterious and delayed(?) design) are 128 bit or 256 bit, and sometimes a bunch of wide cores are a better fit than a GPU.
> sometimes a bunch of wide cores are a better fit than a GPU.
GPUs have a very weak memory model, and that really helps with scaling.
If you're running intrinsically branchy code, then GPUs' SIMD-oriented programming model might indeed be a poor fit. But, you're going to have more overhead from running it on a cache-coherent CPU with lots of cores.
Where I think A64FX did so well on the efficiency front is that their cores were wide, relatively simple, and clocked rather conservatively. Scaling the performance of such a CPU will necessarily come at the expense of efficiency. Especially since, for the more general kinds of server workloads they want to address, you're going to need more complex cores.
Specialized use cases with custom IP blocks for acceleration of specific workloads. That's what ARM is best at. But for the people who want power and performance, the only solution is x86. ARM cannot replace that even if Fujitsu's A64FX successor is 100x faster.
Look at TR Pro, most of the home users who want a simple Server cannot buy, extremely high cost and lack of even users across various forums to troubleshoot. Now look at used Xeon, Opteron and other x86 CPUs. Abundance of resources, you can buy any Mobo, get a chip and start your own Proxmox, VMWare instances or any piece of code or HW PCIe expansion cards, SAS cards etc you name it.
That's the beauty of x86, I look forward to own a Xeon / EPYC system hopefully soon.
This isn't so much a collection of custom accelerators, as much as it is a GPU with a CPU-like memory model and the ability to run an operating system and handle page faults. Kind of like Xeon Phi, but hopefully working better. That's a wonderful thing, and something that I hope we'll see a lot more of in the future. I've been hoping since A64FX that it work out well for them, and the fact that they're making a second generation of it is perhaps a positive indication.
The slides make it sound more like a server CPU with some accelerators, not something with a weird core/memory config like the A64FX, but I too hope it stats weird.
In significant ways, Intel's Xeon Max (Sapphire Rapids w/ HBM) follows in the footsteps of A64FX. It has dual- 512-bit SIMD per core + HBM and an advanced interconnect (CXL). Core count is also similar. So, you could say we're seeing a convergence between HPC and more mainstream server CPUs.
One distinct advantage Xeon Max has is its AMX engines.
In a way, you might see the A64FX as taking a page from the Xeon Phi's playbook. However, I'm sure their cores were more competent. I wonder how much better KNL would've been, had it used Gracemont cores instead of Silvermont. Not enough to save it, but perhaps enough to count?
I could be misinterpreting your post but it looks like you're confusing requirements for a cheap home lab with the requirements of an organization with compute work that scales to many racks or even buildings.
> Specialized use cases with custom IP blocks for acceleration of specific workloads. > That's what ARM is best at.
Ampere Altra and Amazon Graviton 2 & 3 are both competent server CPUs with no workload-specific IP blocks. I think you're confusing mobile SoCs with server CPUs, here. Take a look at Nvidia's Grace CPU, as well.
> That's the beauty of x86, I look forward to own a Xeon / EPYC system hopefully soon.
Cool, but this isn't being made for home users like you. It's being made for datacenter and HPC customers.
Most of these CPUs will be deployed and used in the cloud. The necessary ecosystem support for ARM servers already exists. Last year, Microsoft deployed Altra nodes on Azure, and Google is rumored to be developing its own ARM-based server CPU.
Amazon already has their own AI chips. Maybe they'll join AMD, Nvidia, and Intel in making a MCM which combines them with CPU dies, or maybe not.
I think we should note that Nvidia's approach is the loosest coupling, with their GPU Compute dies in a physically distinct package that merely has the ability to be paired with a CPU on the same daughter card. So, it's not as if they've all embraced the concept of pairing both in the same package.
Just read a rumor that NEC is discontinuing its line of Vector Engine accelerators (source: NextPlatform.com). This leaves Fujitsu holding the bag, I suppose.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
21 Comments
Back to Article
dwillmore - Friday, March 10, 2023 - link
I just googled for the price of those HPE Apollo 80 HPC boxes. Yeah, family car price range.Will the new chip get us down to the cheap car/nice motorcycle range?
Threska - Friday, March 10, 2023 - link
I think the point of these is to show that ARM is competitive with the x86 architecture. Now all we need is some PowerPC and RISC-V.Cooe - Saturday, March 11, 2023 - link
PowerPC literally doesn't exist anymore and hasn't for like over a decade... You're thinking of IBM POWER, which is already decently competitive for Big Iron.dotjaz - Saturday, March 25, 2023 - link
You either don't know how to use "literally" or you live in the wrong decade.AFAIK PowerPC e6500 only entered production in 2012, e6501 in 2013. And they are still available. There are still MCUs still using PowerPC.
PowerPC literally still exists in embedded systems.
mode_13h - Saturday, March 11, 2023 - link
I don't see how they'll be able to repeat the performance or efficiency leaps they achieved with A64FX. It was the first out of the gate with SVE, but now others are doing it. And their hard-wired AI accelerator will be challenging industry players with several generations under their belts.They have one possible advantage on efficiency, which is that if you have a sufficiently large budget, you can add more nodes and clock them lower. If GPU-based supercomputers wanted better efficiency numbers, that's all they'd have to do. However, budget constraints mean they have to run a smaller number of GPUs well outside their optimal efficiency range.
Since Fugaku was designed in the post-Fukushima era, energy was probably very expensive. That could've pushed them to budget more on equipment, for the benefit of lower operating costs.
brucethemoose - Sunday, March 12, 2023 - link
HPC customers like really wide SIMD, if they go for that again. Other SVE2 implementations (other than SiPearl's mysterious and delayed(?) design) are 128 bit or 256 bit, and sometimes a bunch of wide cores are a better fit than a GPU.brucethemoose - Monday, March 13, 2023 - link
Actually I think this is incorrect, it may be a stock ARM core.mode_13h - Monday, March 13, 2023 - link
> sometimes a bunch of wide cores are a better fit than a GPU.GPUs have a very weak memory model, and that really helps with scaling.
If you're running intrinsically branchy code, then GPUs' SIMD-oriented programming model might indeed be a poor fit. But, you're going to have more overhead from running it on a cache-coherent CPU with lots of cores.
Where I think A64FX did so well on the efficiency front is that their cores were wide, relatively simple, and clocked rather conservatively. Scaling the performance of such a CPU will necessarily come at the expense of efficiency. Especially since, for the more general kinds of server workloads they want to address, you're going to need more complex cores.
Silver5urfer - Saturday, March 11, 2023 - link
Specialized use cases with custom IP blocks for acceleration of specific workloads. That's what ARM is best at. But for the people who want power and performance, the only solution is x86. ARM cannot replace that even if Fujitsu's A64FX successor is 100x faster.Look at TR Pro, most of the home users who want a simple Server cannot buy, extremely high cost and lack of even users across various forums to troubleshoot. Now look at used Xeon, Opteron and other x86 CPUs. Abundance of resources, you can buy any Mobo, get a chip and start your own Proxmox, VMWare instances or any piece of code or HW PCIe expansion cards, SAS cards etc you name it.
That's the beauty of x86, I look forward to own a Xeon / EPYC system hopefully soon.
Dolda2000 - Sunday, March 12, 2023 - link
This isn't so much a collection of custom accelerators, as much as it is a GPU with a CPU-like memory model and the ability to run an operating system and handle page faults. Kind of like Xeon Phi, but hopefully working better.That's a wonderful thing, and something that I hope we'll see a lot more of in the future. I've been hoping since A64FX that it work out well for them, and the fact that they're making a second generation of it is perhaps a positive indication.
brucethemoose - Monday, March 13, 2023 - link
We will see.The slides make it sound more like a server CPU with some accelerators, not something with a weird core/memory config like the A64FX, but I too hope it stats weird.
mode_13h - Monday, March 13, 2023 - link
In significant ways, Intel's Xeon Max (Sapphire Rapids w/ HBM) follows in the footsteps of A64FX. It has dual- 512-bit SIMD per core + HBM and an advanced interconnect (CXL). Core count is also similar. So, you could say we're seeing a convergence between HPC and more mainstream server CPUs.One distinct advantage Xeon Max has is its AMX engines.
brucethemoose - Tuesday, March 14, 2023 - link
The Xeon Max uses humongous server cores, and the core count is similar b/c of the newer process.If they were AVX512 Gracemont cores (which would triple the core count, maybe?), then it would resemble the A64FX much more closely.
mode_13h - Tuesday, March 14, 2023 - link
Point taken about the big cores.In a way, you might see the A64FX as taking a page from the Xeon Phi's playbook. However, I'm sure their cores were more competent. I wonder how much better KNL would've been, had it used Gracemont cores instead of Silvermont. Not enough to save it, but perhaps enough to count?
BushLin - Sunday, March 12, 2023 - link
I could be misinterpreting your post but it looks like you're confusing requirements for a cheap home lab with the requirements of an organization with compute work that scales to many racks or even buildings.mode_13h - Monday, March 13, 2023 - link
> Specialized use cases with custom IP blocks for acceleration of specific workloads.> That's what ARM is best at.
Ampere Altra and Amazon Graviton 2 & 3 are both competent server CPUs with no workload-specific IP blocks. I think you're confusing mobile SoCs with server CPUs, here. Take a look at Nvidia's Grace CPU, as well.
> That's the beauty of x86, I look forward to own a Xeon / EPYC system hopefully soon.
Cool, but this isn't being made for home users like you. It's being made for datacenter and HPC customers.
Most of these CPUs will be deployed and used in the cloud. The necessary ecosystem support for ARM servers already exists. Last year, Microsoft deployed Altra nodes on Azure, and Google is rumored to be developing its own ARM-based server CPU.
brucethemoose - Tuesday, March 14, 2023 - link
Not sure about Ampere, but I suspect Amazon will *gravitate* towards accelerators soon enough.mode_13h - Tuesday, March 14, 2023 - link
Amazon already has their own AI chips. Maybe they'll join AMD, Nvidia, and Intel in making a MCM which combines them with CPU dies, or maybe not.I think we should note that Nvidia's approach is the loosest coupling, with their GPU Compute dies in a physically distinct package that merely has the ability to be paired with a CPU on the same daughter card. So, it's not as if they've all embraced the concept of pairing both in the same package.
Dizoja86 - Monday, March 13, 2023 - link
Anyone else read the headline and wonder why any company would still be using Athlon 64 FX's?mode_13h - Tuesday, March 14, 2023 - link
LOL, not if you were paying attention when the original A64FX launched. Most of us got that sorted out, back then.mode_13h - Monday, March 27, 2023 - link
Just read a rumor that NEC is discontinuing its line of Vector Engine accelerators (source: NextPlatform.com). This leaves Fujitsu holding the bag, I suppose.