Original Link: https://www.anandtech.com/show/10905/amd-announces-radeon-instinct-deep-learning-2017
AMD Announces Radeon Instinct: GPU Accelerators for Deep Learning, Coming In 2017
by Ryan Smith on December 12, 2016 9:00 AM EST- Posted in
- AMD
- Radeon
- GPUs
- Fiji
- Machine Learning
- Polaris
- Vega
- Neural Networks
- AMD Instinct
With the launch of their Polaris family of GPUs earlier this year, much of AMD’s public focus in this space has been on the consumer side of matters. However now with the consumer launch behind them, AMD’s attention has been freed to focus on what comes next for their GPU families both present and future, and that is on the high-performance computing market. To that end, today AMD is taking the wraps off of their latest combined hardware and software initiative for the server market: Radeon Instinct. Aimed directly at the young-but-quickly-growing deep learning/machine learning/neural networking market, AMD is looking to grab a significant piece of what is potentially a very large and profitable market for the GPU vendors.
Broadly speaking, while the market for HPC GPU products has been slower to evolve than first anticipated, it has at last started to arrive in the last couple of years. Early efforts to port neural networking models to GPUs combined with their significant year-after-year performance growth have not only created a viable GPU market for deep learning, but one that is rapidly growing as well. It’s not quite what anyone had in mind a decade ago when the earliest work with GPU computing began, but then rarely are the killer apps for new technologies immediately obvious.
At present time the market for deep learning is still young (and its uses somewhat poorly defined), but as the software around it matures, there is increasing consensus in the utility of being able to apply neural networks to analyze large amounts of data. Be it visual tasks like facial recognition in photos or speech recognition in videos, GPUs are finally making neural networks dense enough – and therefore powerful enough – to do meaningful work. Which is why major technology industry players from Google to Baidu are either investigating or making use of deep learning technologies, with many smaller players using it to address more mundane needs.
Getting to the meat of AMD’s announcement today then, deep learning has the potential to be a very profitable market for the GPU manufacturer, and as a result the company has put together a plan for the next year to break into that market. That plan is the Radeon Instinct initiative, a combination of hardware (Instinct) and an optimized software stack to serve the deep learning market. Rival NVIDIA is of course already heavily vested in the deep learning market – all but staking the Tesla P100 on it – and it has paid off handsomely for the company as their datacenter revenues have exploded.
As is too often the case for AMD, they approach the deep learning market as the outsider looking in. AMD has struggled for years in the HPC GPU market, and their fortunes have only very recently improved as the Radeon Open Compute Platform (ROCm) has started paying dividends in the last few months. The good news for AMD here is that by starting on ROCm in 2015, they’ve already laid a good part of the groundwork needed to tackle the deep learning market, and while reaching parity with NVIDIA in the wider HPC market is still some time off, the deep learning market is far newer, and NVIDIA is much less entrenched. If AMD plays their cards right and isn’t caught-off guard by NVIDIA, they believe they could capture a significant portion of the market within a generation.
(As an aside, this is the kind of quick, agile action that AMD hasn’t even been able to plan for in the past. If AMD is successful here, I think it will prove the value in creating the semi-autonomous Radeon Technologies Group under Raja Koduri).
Radeon Instinct Hardware: Polaris, Fiji, Vega
Diving deeper into matters, let’s talk about the Radeon Instinct cards themselves. The Instinct cards are for all practical purposes a successor (or spin-off) to AMD’s current FirePro S series cards, so if you are familiar with AMD’s hardware there, then you know what to expect. Passively cooled cards geared for large scale server installations, offered across a range of power and performance options.
As this is a new product line the Instinct cards don’t have any immediate predecessors in AMD’s FirePro S lineup, but unsurprisingly, AMD has structured their new family of server cards similar to how NVIDIA has structured their P4/P40/P100 lineup of deep learning cards. All told, AMD is announcing 3 cards today, all 3 which tap different AMD GPUs, and are (roughly) named after their expected performance levels.
AMD Radeon Instinct | |||||
Instinct MI6 | Instinct MI8 | Instinct MI25 | |||
Memory Type | 16GB GDDR5 | 4GB HBM | "High Bandwidth Cache and Controller" | ||
Memory Bandwidth | 224GB/sec | 512GB/sec | ? | ||
Single Precision (FP32) | 5.7 TFLOPS | 8.2 TFLOPS | 12.5 TFLOPS | ||
Half Precision (FP16) | 5.7 TFLOPS | 8.2 TFLOPS | 25 TFLOPS | ||
TDP | <150W | <175W | <300W | ||
Cooling | Passive | Passive (SFF) |
Passive | ||
GPU | Polaris 10 | Fiji | Vega | ||
Manufacturing Process | GloFo 14nm | TSMC 28nm | ? |
Starting things off, we have the Radeon Instinct MI6. This is a Polaris 10 card analogous to the consumer RX 480. As Polaris doesn’t have much in the way of special capabilities for deep learning (more on this in a second), AMD is pitching the card as their baseline card for neural network inference (execution). At 5.7 TFLOPS (FP16 or FP32) it will draw under 150W, and while pricing for the family hasn’t been announced, I believe it’s a safe bet that as the baseline card the MI6 will offer the best performance per dollar across the Instinct family.
Meanwhile in an unexpected move, AMD will be keeping their 2015 Fiji GPU around for the second card, the Instinct MI8. This card is for all intents and purposes a rebranded Radeon R9 Nano, AMD’s power tuned Fiji card that has proven quite popular with their server customers. Within the Instinct lineup, it is essentially an unusual variant to the MI6, offering higher throughput and greatly increased memory bandwidth for only a small increase in power consumption, with the drawback of Fiji’s 4GB VRAM limitation. Since it offers better performance than the MI6 and is smaller to boot, I expect we’ll see AMD pitch the MI8 as a premium alterative for inference.
The MI6 and MI8 will be going up against NVIDIA’s P4 and P40 accelerators. AMD’s cards don’t directly line-up against the NVIDIA cards in power consumption or expected performance, so the competitive landscape is somewhat broad, but those are the cards AMD will need to dethrone in the inference landscape. One potential issue here that I’m waiting to see if and how AMD addresses closer to the launch of the Instinct family will be the lack of high-speeds modes for lower precision operations. The competing Tesla cards can process 8-bit integer (INT8) operations at up to 4x speed, something the MI6 and MI8 Instinct cards can’t do. INT8 is something of a special case, but if NVIDIA’s expectations for inferencing with INT8 come to pass, then it means AMD has to compete more strongly on price than performance.
Last, but certainly not least in the Instinct family is the most powerful card of them all, and arguably the cornerstone of what the family is meant to become: the MI25. This is based on AMD’s forthcoming Vega GPU family, and while AMD is not sharing much in the way of new details on Vega today, they are leaving no doubts that this is going to be a high performance card. The passively cooled card is rated for sub-300W operation, and based on AMD performance projections elsewhere, AMD makes it clear that they’re targeting 25 TFLOPS FP16 (12.5 TFLOPS FP32) performance.
Significantly, of the few things AMD is saying about Vega right now, is that they’re confirming that it supports packed math formats for FP16 operations. This is something that first appears in Sony’s Playstation 4 Pro, with a strong hint that it was a feature of a future AMD architecture, and now this has been confirmed.
With AMD pitching the MI25 as a training accelerator, offering a packed math mode for FP16 is critical to the product. Neural network training very rarely requires higher precision FP32 math, which is otherwise the default for GPUs. Instead, FP16 is suitably precise for a process that is inherently imprecise, and as a result offering a fast FP16 mode makes the card significantly faster at its intended task. Coupled with the already high throughput rates of GPUs due to their wide arrays of ALUs, and this is what makes GPUs so potent at neural network training.
As AMD’s sole training card, the MI25 will be going up against NVIDIA’s flagship accelerator, the Tesla P100. And as opposed to the inference cards, this has the potential to be a much closer fight. AMD has parity on packed instructions, with performance that on paper would exceed the P100. AMD has yet to fully unveil what Vega can do – we have no idea what “NCU” stands for or what AMD’s “high bandwidth cache and controller” are all about – but on the surface there’s the potential for the kind of knock-down fight at the top that makes for an interesting spectacle. And for AMD the stakes are huge; even if they can’t necessarily win, being able to price the MI25 even remotely close to the P100 would give them huge margins. More practically speaking, it means they could afford to significantly undercut NVIDIA in this space to capture market share while still making a tidy profit.
On a final note, while AMD isn’t commenting on the future of FirePro S or other server GPU products – so it’s not clear if Instinct will be their entire server GPU backbone or only part of it – it’s interesting to note that they are pointing out that one of the ways they intend to stand out from NVIDIA is to not restrict their virtualization support to certain cards.
In other words, if Instinct does end up being AMD’s sole line of server cards, then these cards will be fully capable of serving the virtualization market just as well as the deep learning markets.
MIOpen: The Radeon Instinct Software Stack
While solid hardware is the necessary starting point for building a deep learning product platform, as AMD has learned the hard way over the years, it takes more than good hardware to break into the HPC market. Software is just as important as hardware (if not more so), as software developers want to get to as close to plug-and-play as possible. For this reason, frameworks and libraries that do a lot of the heavy lifting for developers are critical. Case in point, along with the Tesla hardware, the other secret ingredient in NVIDIA’s deep learning stable has been cuDNN and their other libraries, which have moved most of the effort of implementing deep learning systems off of software developers and on to NVIDIA. This is the kind of ecosystem AMD needs to be able to build for Radeon Instinct to crack the market.
The good news for AMD is that they’re already partially here with ROCm, which lays the groundwork for their software stack. They now have the low-level tools such as stable programming languages and good compilers to build further libraries and frameworks off of that. The Radeon Instinct software stack, then, is all about building on top of ROCm.
The cornerstone of AMD’s efforts here (and their answer to cuDNN) is MIOpen, a high performance deep learning library for Radeon Instinct. AMD’s performance slides should be taken with a suitably large grain of salt when it comes to competitive comparisons, but they none the less hammer the point home that the company has been focused on putting together a powerful library to support their cards. The library will be responsible for providing optimized routines for basic neural network functions such as convolution, normalization, and activation functions.
Meanwhile built on top of MIOpen will be updated versions of the major deep learning frameworks, including Caffe, Torch 7, and TensorFlow. It’s these common frameworks that deep learning applications are actually built against, and as a result AMD has been lending their support to the developers of these frameworks to get MIOpen/AMD optimized paths added to them. All of this can sound a bit mundane to outsiders, but its importance cannot be overstated; it’s the low-level work that is necessary for AMD to turn the Instinct hardware into a complete ecosystem.
Alongside their direct library and framework support, when it comes to the Instinct software stack, expect to see AMD once again hammer the benefits of being open source. AMD has staked the entire ROCm platform on this philosophy, so it’s to be expected. None the less it’s an interesting point of contrast to NVIDA’s largely closed ecosystem. AMD believes that deep learning developers are looking for a more open software stack than what NVIDIA has provided – that being closed has limited developers’ ability to make full use of NVIDIA’s platform – so this will be AMD’s opportunity to put that to the test.
Instinct Servers
Finally, along with creating a full hardware and software ecosystem for the Instinct product family, AMD also has their eye on the bigger picture, going beyond individual cards and out to servers and whole racks. As a manufacturer of both GPUs and CPUs, AMD is in a rare position to be able to offer a complete hardware platform, and as a result the company is keen to take advantage of the synergy between CPU and GPU to offer something that their rivals cannot.
The basis for this effort is AMD’s upcoming Naples platform, the server platform based on Zen. Besides offering a potentially massive performance increase over AMD’s now well-outdated Bulldozer server platform, Naples lets AMD flex their muscle in heterogeneous applications, tapping into their previous experience with HSA. This is a bit more forward looking – Naples doesn’t have an official launch date yet – but AMD is optimistic about their ability to own large-scale deployments, providing both the CPU and the GPUs in large deep learning installations.
Going a bit off the beaten path here, perhaps the most interesting aspect of Naples as it intersects with Radeon Instinct comes down to PCIe lanes. All signs point to Naples offering at least 64 PCIe lanes per CPU; this is an important metric because it means there are enough lanes to give up to 4 Instinct cards a full, dedicated PCIe x16 link to the host CPU and the other cards. Intel’s Xeon platform only offers 40 PCIe lanes, which means a quad-card configuration has to either sacrifice on bandwidth or latency, trading off between a mix of x8 and x16 slots, using two Xeon CPUs, or building in a high-end PCIe switch to route together 4 x16 slots. Ultimately for installations focusing on GPU-heavy workloads, this gives AMD a distinct advantage since it means they can drive 4 Instinct cards off of a single CPU, making it a cheaper option than the aforementioned Xeon configurations.
In any case, AMD has already lined up partners to show off Naples server configurations for the Radeon Instinct. SuperMicro and Inventec are both showcasing server/rack designs for anywhere between 3 and 120 Instinct MI25 cards. The largest systems will of course involve off-system networking and more complex networking fabrics, and while AMD isn’t saying too much on the subject at this time, it’s clear that they’re being mindful of what they need to support truly massive clusters of cards.
Closing Thoughts
Wrapping things up with today’s Radeon Instinct announcement, while today’s revelations are closer to a teaser than a fully fleshed out product announcement, it’s none the less clear that AMD is about to embark on their most aggressive move in the GPU server space since their first FireStream cards almost a decade ago. Making gains in the server space has long been one of the keys to AMD’s success, both for CPUs and GPUs, and with the Radeon Instinct hardware and overarching initiative, AMD has laid out a critical plan for how to get there.
Not that any of this will come easy for AMD. Breaking back into the server market is a recurring theme for them, and their struggles there are why it’s recurring. NVIDIA is already several steps ahead of AMD in the deep learning GPU market, so AMD needs to be quick to catch up. The good news for AMD here is that unlike the broader GPU server market, the deep learning market is still young, so AMD has the opportunity to act before anyone gets too entrenched. It’s still early enough in the game that AMD believes that if they can flip just a few large customers – the Googles and Facebooks of the world – that they can make up for lost time and capture a significant chunk of the deep learning market.
With that said, as the Radeon Instinct products are not set to ship until H1 of 2017, a lot can change, both inside and outside of AMD. The company has laid down what looks to be a solid plan, but now they need to show that they can follow-through on it, executing on both hardware and software on schedule, and handling the inevitable curveball. If they can do that, then the deep learning market may very well be that server GPU success that the company has spent much of the past decade looking for.