Original Link: https://www.anandtech.com/show/12409/intel-launches-xeon-d-2100-series-socs-edge



For certain groups of users, Intel’s Xeon D product line has been a boon in performance per watt metrics. The goal of offering a fully integrated enterprise-class chip, with additional IO features, with lots of cores and at low power, was a draw to many industries: storage, networking, communications, compute, and particularly for ‘Edge’ computing. We reviewed the first generation Xeon D-1500 series back in June 2015, and today Intel is launching the second generation, the Xeon D-2100 series.

Fourteen Processors, Focusing on Networking and QuickAssist

Before discussing the platform as a whole, we’ll dive straight into the launch processor list.

Intel Xeon D-2100 Series 
AnandTech Cores Base
Freq
All-Core
Freq
Turbo
Freq
TDP DDR4 Price
Edge Server and Cloud SKUs
D-2191 18 1.6 GHz 2.2 GHz 3.0 GHz 86 W 2400 $2407
D-2161I 12 2.2 GHz 2.8 GHz 3.0 GHz 90 W 2133 $962
D-2141I 8 2.2 GHz 2.7 GHz 3.0 GHz 65 W 2133 $555
Network Edge and Storage SKUs
D-2183T 16 2.2 GHz 2.8 GHz 3.0 GHz 100 W 2400 $1764
D-2173IT 14 1.7 GHz 2.3 GHz 3.0 GHz 70 W 2133 $1229
D-2163IT 12 2.1 GHz 2.6 GHz 3.0 GHz 75 W 2133 $930
D-2143IT 8 2.2 GHz 2.7 GHz 3.0 GHz 65 W 2133 $566
D-2142IT 8 1.9 GHz 2.5 GHz 3.0 GHz 65 W 2133 $438
D-2123IT 4 2.2 GHz 2.7 GHz 3.0 GHz 60 W 2400 $213
Integrated Intel QuickAssist Technology SKUs
D-2187NT 16 2.0 GHz 2.4 GHz 3.0 GHz 110 W 2666 $1989
D-2177NT 14 1.9 GHz 2.3 GHz 3.0 GHz 105 W 2666 $1443
D-2166NT 12 2.0 GHz 2.3 GHz 3.0 GHz 85 W 2133 $1005
D-2146NT 8 2.3 GHz 2.5 GHz 3.0 GHz 80 W 2133 $641
D-2145NT 8 1.9 GHz 2.5 GHz 3.0 GHz 65 W 2133 $502
Intel Xeon D-1500 Series
D-1581 16 1.8 GHz - 2.4 GHz 65 W 2133 -
D-1571 16 1.3 GHz - 2.1 GHz 45 W 2133 $1222
D-1553N 8 2.3 GHz - 2.7 GHz 65 W 2400 $855
D-1531 6 2.2 GHz - 2.7 GHz 45 W 2133 $348

Intel has defined the new Xeon D-2100 series in three areas: Edge Server and Cloud, Edge Network and Storage, and finally the QuickAssist variants. The first segment, Edge Server and Cloud, has the three processors that were placed onto Intel’s January price list, published before today’s official launch. We reported on that last week, noting several key comparisons in the design compared to the D-1500 series, such as naming, core configurations, and power consumption.

For naming, Intel has the following scheme:

  • I (such as D-2161I): Stands for Integrated Intel Ethernet
  • T (such as D-2183T): Stands for high temperature support or extended reliability offerings
  • N (such as D-2187NT): Stands for Intel Ethernet and Intel QuickAssist Technology
Intel Xeon D-2100 Series Feature Support
AnandTech Cores 10GbE QAT
(Gbps)
High
Temp
Price
Edge Server and Cloud SKUs
D-2191 18 - - - $2407
D-2161I 12 4x10 - - $962
D-2141I 8 4x10 - - $555
Network Edge and Storage SKUs
D-2183T  ** 16 4x10 - Yes $1764
D-2173IT 14 4x10 - Yes $1229
D-2163IT 12 4x10 - Yes $930
D-2143IT 8 4x10 - Yes $566
D-2142IT 8 4x10 - Yes $438
D-2123IT 4 4x10 - Yes $213
Integrated Intel QuickAssist Technology SKUs
D-2187NT 16 4x10 100 Yes $1989
D-2177NT 14 4x10 100 Yes $1443
D-2166NT 12 4x10 40 Yes $1005
D-2146NT 8 4x10 40 Yes $641
D-2145NT 8 4x10 20 Yes $502

**The D-2183T does not follow the naming convention below, as it has 10G support but no 'I' in the name.

For the processor list, there are some key patterns to identify. Almost every processor supports four integrated 10 gigabit Ethernet ports, except the top 18-core processor, the Xeon D-2191. This processor will have to use external MAC/PHY combinations through its PCIe lanes to get Ethernet ports, making a trade-off for the peak number of cores. By not supporting the integrated Ethernet ports however, there is a trend in the rated TDP compared to the 16-core counterparts of similar frequencies that do support the Ethernet ports.

Memory support for the processor line extends to 512 GB of DDR4 ECC memory, and includes both RDIMM and LRDIMM support. This comes through supporting four channels of DDR4, at two modules per channel. What is inconsistent however is the rated support of the processors: most of the cheaper processors only support DDR4-2133 memory, expect the quad-core Xeon D-2123IT, which does support DDR4-2400. Above this, customers will need to spend at least $1400 to get a minimum of fourteen cores to get either DDR4-2400 or DDR4-2666. For the higher speed, this is limited to the top two QuickAssist enabled processors, which also need to support 100 Gbps of QAT. These processors also have the highest TDP, having almost the ‘best’ of everything.

All the processors across the range support Turbo Boost 2.0 technology, with a single-core boost frequency of 3.0 GHz. There is, however, a wide range of all-core turbo frequencies (from 2.2 GHz to 2.8 GHz) as well as a range of base frequencies (1.6 GHz to 2.2 GHz). It is worth noting that Intel quotes TDP at the all-core base frequency, not in any turbo modes, so it is likely that the 100W TDP models might draw substantially more than 100W power at the all-core turbo frequency (I’ve had an article on this brewing for some time, I just need to get my thoughts down on paper). Ultimately the motherboard manufacturer can enforce a power limit at the BIOS level if a customer has a specific request.

Another element to the stack is the extreme pricing of the 18-core Xeon D-2191. Despite not having integrated Ethernet nor QuickAssist, its $2407 rated tray price is over $400 higher than the Xeon D-2187NT which has two fewer cores but a higher base frequency, a higher all-core turbo, higher DRAM support, and the best options on integrated Ethernet and QuickAssist (it is 24W higher on TDP as well). From a pure price perspective, this jump from the top core count part down to the one just below it is sizeable, although Intel does have a history with this, such as the E3-1200 Xeon line where the top processor, with a 100 MHz higher frequency than the second best, was 30%+ higher in cost.


One Xeon D-2100 platform already spotted

We’ll revisit some of these parts, but during our briefing Intel went into some of the newer features of the Xeon D-2100 platform.

Examining Per-Core Turbo Frequencies

As part of every press call we have with Intel, we always ask for more information, especially when it comes down to die sizes, transistor counts, and per-core turbos. Some of this information is readily available from the processor, for example examining P-states or looking into register values for turbo modes. Aside from acquiring every unit and probing them ourselves, we typically request this information en-masse so it can be accurate as a base, knowing that in some circumstances, OEMs/motherboard manufacturers can implement their own system on top.

Thankfully, Intel agreed to share the per-core turbo ratios for the new Xeon D-2100.


Click through for full resolution

For the non-AVX turbo frequencies, Intel keeps the 3.0 GHz turbo mode up to two cores of load. This reduces by 200-300 MHz for three and four core loads, then decreasing again every four cores. Two processors stand out here: the D-2183T maintains a 2.8 GHz frequency from 3-16 cores, pointing to an interesting tradeoff compared to the D-2187T, which comes down to 2.4 GHz but has QuickAssist. The D-2161IT also has a 2.8 GHz frequency from 3-12 cores, which along with the increase in base frequency contributes to the +15W TDP over the D-2163IT.


Click through for full resolution


Click through for full resolution

When we move to the AVX2/AVX-512 turbo modes, most processors still drop frequencies in core-groups of four, after the first pair of cores, but the drops are more significant. The lowest frequency observed is 1.70 GHz, on the D-2191 and D-2173IT when all cores are running AVX-512. For most of the processors, the all-core AVX-512 frequency is still above the base frequency, although a few do go below.

Many thanks to Intel for providing this information, as recently it has been hard to come by.



Ethernet Support

One of the benefits to the previous generation of Xeon D was its support for dual 10 gigabit Ethernet controllers integrated into the processor, thus only requiring PHY (physical layer) chips for support. For almost all of the processors in the stack, that moves up to quad 10GbE support, either through copper or fiber. These ports also support accelerated Remote Direct Memory Access (RDMA) and native Software Fault Isolation (SFI).


Taken from our Skylake-SP workshop slide decks

We confirmed with Intel that any third party PHY can be used, not just Intel. Some vendors that have already listed Xeon D-2100 motherboards have included 25 GbE support in their technical specifications – we clarified with Intel that there is no 25 GbE support included on the chip, so those system integrations must be using external MAC/PHYs or PCIe cards in order to include that support.

Intel QuickAssist Technology

On Intel’s QuickAssist Technology: this is still relatively new to most users. QAT is essentially an offload processor for a variety of cryptographic functions, including symmetric/asymmetric encryption, authentication, digital signatures, RSA, ECC, and so on. The idea is this sort of task would normally be handled by the CPU, taking up time and power that could be better spent elsewhere, and that a dedicated engine to this (QAT) would be faster and more power efficient. Intel has so far deployed QAT in its Denverton Atom C3000 series processors, Xeon Scalable chipets, and add-in PCIe cards, but it now comes to the Xeon D-2100 series as well. It was not present on the original D-1500 series processors, however Intel did have mid-cycle updates (such as the D-1553N) which had 40 Gbps QAT support.

Taken from our Skylake-SP workshop slide decks

The implementation of QAT on the Xeon D-2100 series uses its own PCIe 3.0-like interconnect inside the chip, although Intel did not elaborate as to what this entails. We do know that for the Xeon Scalable chipset implementations, the chipset requires up to 16 lanes of PCIe 3.0 to accelerate QAT, so we do expect that Intel is using the equivalent in bandwidth. We have asked for more information, as to whether the QAT engine is actually on chip, or if the PCIe silicon variant is part of the package. I would expect the former, although the latter is easier - this could be identical to the Xeon-SP 18-core silicon found in the enterprise processors, that already have 16 on-package lanes set aside for OmniPath Fabric. 

It is worth noting that 100 Gbps QAT support is only for a couple of the Xeon D-2100 processors, with a couple of others giving 40 Gbps and 20 Gbps.



Migrating from Broadwell to Skylake-SP:

More Cores, More Memory, Cache, AVX-512

The way that Intel is approaching the Xeon D product family is changing. Previously, the Broadwell-based Xeon parts were compact, with HEDT level core counts and a nice feature set. For this generation, Intel has decided to migrate to the server Skylake-SP core, rather than the standard Skylake-S core. Along with the generational enhancements over Broadwell, this means an adjusted cache hierarchy, use of Intel’s new core-to-core mesh technology, and the addition of AVX-512 units. This means that the new Xeon D-2100 series is, in silicon, a big 18-core Skylake-SP behemoth. We have reached out to Intel to confirm if this is the case for all the cores in the stack, as well as die sizes and transistor counts. Stay tuned for that information.

As we reported in depth in our analysis of the Skylake-SP core, the implementation of the cache, mesh, and AVX-512 changes, has a significant impact on how software has to consider running on these cores when it comes to memory accesses and core-to-core communication. Here’s a small brief primer on the changes:

Migrating from Broadwell to Skylake-SP
  D-1500 D-2100 Overall
Cache L2 256 KB L2 1 MB L2 Rule-of-thumb 2x hit-rate in L2

Overall L3 not as useful
L3 1.5 MB L3/core 1.375 MB L3/core
  L3 inclusive of L2 L3 is victim cache
Mesh / Uncore Ring bus Each node
(PCIe / core / DRAM)
has crossbar partition
for x/y routing
- Better scaling
- Better av. node-to-node latency

- More complex 
- More power at same frequency
AVX-512 AVX
AVX 2.0
AVX512F
AVX512CD
AVX512VW
AVX512DQ
AVX512VL
Substantial vector performance improvement

Increase in peak TDP
and silicon die area

A note on the AVX-512 support: Intel’s current consumer and enterprise-based Skylake-SP processors offer two different variants for this. Some processors, most notably the cheaper ones, only have one 512-bit FMA (fused multiply-add) execution unit for AVX-512 support, while the bigger and more expensive processors have two FMA units. The benefit of having two FMA units in action allows the AVX-512 silicon to be fed, and increases throughput, at the potential expense of power and longevity. For Xeon D-2100, Intel has stated that all of the processors only have a single 512-bit FMA unit.

Also on the mesh: compared to the old ring bus methodology, from our tests on the consumer line, it is clear that Intel is running the mesh (or the ‘uncore’) at a lower frequency overall, which may cause a drop in core-to-core bandwidth in the newer processors. We have asked Intel to confirm what mesh frequency is being used in the D-2100 series, and we are waiting to see if that information will be disclosed. It is worth noting that when we get access to these parts, we can probe for the frequency very easily, so it would help if Intel officially disclosed the value.

With the core migration comes other feature changes. The new Xeon D-2100 will be rated for quadruple the memory capacity of the Xeon D-1500, as the number of memory channels doubles from two to four, and RDIMM/LRDIMM modules up to 64 GB each are supported. This means a single processor can now support 512GB using eight 64GB RDIMMs. The previous generation only supported 32 GB RDIMMs.

 

Up to 32 PCIe 3.0 Lanes, 20 HSIO Lanes, Storage and VROC

For this generation, Intel has kept the number of publicly available PCIe lanes for add-in controllers at 32, which means we are likely to see implementations with x16/x8 PCIe lanes, but also opens up opportunities in cold/warm storage for more RAID cards, or in communications for additional transceivers or accelerators. When we say ‘publicly’ available here, it is clear that the chip has more than the previous version, the presence of QuickAssist means that there is likely at least 48 as part of the design (or 64 if the silicon is identical to the Xeon SP XCC chip design), but due to product segmentation/items like QAT, the amount of lanes for other controllers is kept constant. To a certain extent, this allows Intel to offer the D-2100 series almost as a drop in replacement for those that want to upgrade to Skylake cores.


The Lewisberg Chipset with 26 HSIO lanes, found in Xeon SP

As Xeon D is marketed as a system-on-chip, the traditional chipset is integrated into the platform. Intel has integrated one of its latest series of chipsets, and is offering 20 PCIe 3.0 High-Speed IO (HSIO) lanes for this. As with the chipsets, there will be limits as to where the lanes can go: these are typically limited to a PCIe 3.0 x4 connection at most, and some network controllers are limited to certain HSIO slots, but it does allow for intricate systems to be built.

One of the benefits of the number of PCIe lanes, as well as PCIe switch support, is for storage. Intel is targeting both long-term backup (cold storage) and content delivery networks/CDNs (warm storage) with this product line, and so Intel is keen to promote its PCIe storage and NVMe support. We confirmed that the Xeon D-2100 will be supporting Intel’s Virtual Raid on Chip (VROC), which means that hardware-based RAID 0 and RAID 1 configurations will have additional support benefits, but will be limited to specific NVMe drives and require a hardware based VROC key provided by the OEM. Intel also states that Xeon D-2100 will have fourteen SATA ports integrated, an increase from six SATA ports on the previous generation, although Intel has not disclosed how many AHCI controllers this is, or the SATA RAID support for these controllers on the platform. The platform also supports some legacy IO: eSPI, LPC and SMBus

Either way we slice it, Xeon D-2100 is coming across as a Skylake-SP HCC core and a Lewisburg chipset either melded into one, or two chips on the same package. Lewisburg has options available for different levels of QuickAssist and 10GbE, just as the Xeon D-2100 series. In order to get QAT and 10GbE, the Xeon SP platform has to provide 16 PCIe lanes from the CPU to the chipset for bandwidth - we know the Xeon SP HCC core has 64 PCIe lanes total, so if 16 each are used for QAT/10GbE, that leaves 32 in play. Which is what Xeon D-2100 has. Lewisberg also supports the same IO: 14 SATA ports. It wouldn't make sense for Intel to create a completely new silicon die just for Xeon D, right? If not, that makes Xeon D a multi-chip package with Xeon Gold and Lewisburg.



Sub-NUMA Clustering

When platforms like Xeon D come into existence, focusing on markets that aren’t consumer focused, it can sometimes be difficult to determine which of the consumer or enterprise features are placed into that product. For example, Intel’s Sub-NUMA Clustering (SNC, an upgraded version of Cluster-On-Die) is used in the Xeon Scalable enterprise processors but not on the consumer focused Core-X processors, despite being the same silicon underneath.

SNC is a technology that is drawn from the processor design: within an 18-core processor design, there is actually an x/y arrangement of nodes, in this case we think a 5x5 arrangement. A node can be a core, it can be for memory controllers, for a PCIe root complex, for other IO, and so on. When data needs to be transferred from one node to another, it goes through the mesh topology in what should be the quickest way possible, depending on other node-to-node traffic. Some of the nodes are duplicated, for example, the PCIe x16 root complex nodes, or the memory controllers: for four memory controllers, they are split into pairs, each pair in a separate node, and each of the nodes are at opposite ends of the silicon design. For example, here is the Skylake-SP 18-core layout:

When a system needs main memory, where that memory is held is considered a unified space: the latency to get to all the data is the same. However, due to the physical design of the core, if the data was held in the memory closest to that core in the mesh grid, it would be quicker to access that memory (on average). What SNC does is divide the silicon at a firmware level into two ‘clusters’, with each cluster having a preference for working with the cores, nodes, and memory controllers within its own cluster. There is nothing stopping it going outside its own cluster, but to offer the best latency (sometimes at the expense of peak bandwidth), it is best for each core/node to be limited in this way. Xeon D customers can typically enable SNC in the BIOS of their system, or arrange with their OEM to have it enabled by default.

The reason why SNC is not available in consumer platforms? The benefits/drawbacks of SNC have very little effect on consumer workloads. In most cases users are not striving to minimise their 99th percentile latency figures, while server environments do need to. Also, to get the best out of SNC, software typically has to be written for it, similar to a multi-socket environment.

Intel SpeedShift

The other feature we were interested to see if it made the jump was Intel’s SpeedShift. This technology allows the processor to respond quicker to turbo mode requests, either while in its high-power state or from idle. The standard way a processor works is that when a high-performance power state is requested, the software will send instructions which the operating system will interpret, then the operating system double checks with the firmware for the power state it can ask for, then it will request that power state from the processor. SpeedShift hands control back to the processor, allowing the processor to interpret the frequency and density of the instructions coming into the core, and implement a turbo frequency much quicker.

In previous presentations, Intel has stated that this technology drops the time that the processor moves out of idle into peak turbo from 100 milliseconds down to around 25-30 milliseconds. We confirmed that for suitable OS and hypervisor technologies, SpeedShift will also be enabled on the Xeon D-2100 series platform.

You can read our analysis of Sped Shift on Skylake here:

https://www.anandtech.com/show/9751/examining-intel-skylake-speed-shift-more-responsive-processors

Virtualization

Some of the key ‘edge’ markets that Intel is targeting with the Xeon D-2100 series require virtualization. In our briefing, Intel did not spend much time discussing this part of the product, but did confirm that the latest implementation of VT-x and hardware virtualization technologies is in play. We were told that due to the upgrades over the previous generation of Xeon D, the new platform ‘enables greater VM density for VNF functions, such as Virtual Evolved Packet Core (vEPC), Virtual Content Delivery Network (vCDN), Virtual GiLAN (vGiLAN), Virtualized Radio Access Network (vRAN), and Virtual Broadband Base Unit (vBBU)’.

We were able to confirm that similar to the enterprise platforms, each core can adjust its frequency independently of the other cores, so in multi-user environments if one user is blasting AVX-512 instructions, the frequency of the other cores can still be maintained. This likely applies to L3 cache management, so that ‘noisy neighbors’ cannot crowd L3 use. This situation is less a problem now that the L3 cache is victim cache, but for some customers it can still be an issue.

Availability

Intel stated that it has over a dozen partners, both OEMs and large-scale system integrators, already working with the new D-2100 series ready for product roll-out over 2018. Certain early end-point customers (think the large-scale cloud providers and CDNs) already have had silicon for an amount of time, while it will be rolled out to everyone else in due course through Intel’s partners.

Intel did confirm that it has a sampling program in play for press like AnandTech, so I’m pushing for Johan and Ganesh to get some hands on as we did with the previous generation.

Naming

For the last generation, the Xeon D-1500 series, was tentatively given the code name ‘Broadwell-DE’. By that token, this generation of Xeon D-2100 is based on Skylake, so should be ‘Skylake-DE’. However, references to Skylake-D as an alternative have shown up online, perhaps to keep these code names down to one letter. This isn’t to be confused with Skylake for consumer desktop use, which is usually called Skylake-S. Nice and simple.

Related Reading



Xeon D-2100 Motherboards Appearing

ASRock Rack D2100D8UM

Jumping the gun just a little, we were sent a link to ASRock Rack who has already put some of its Xeon D-2100 products up on the website. Specifically, the D2100D8UM shows a motherboard with a fixed embedded socket, eight memory slots, two PCIe slots, a pair of SFF-8643 breakout ports, and an integrated IPMI with a dedicated network port. In order to take advantage of the integrated 10 GbE ports, customers will have to use a mezzanine card with the appropriate PHY.

We’re not entirely sure how long ASRock Rack has been preparing for this platform, although this looks like one of its more integrated platforms, probably designed for a specific customer in mind. The webpage states in the main headline that it supports up to 512 GB of DDR4, but in the specification table it says it only supports 128 GB of DDR4. Both segments however do clarify RDIMM and LRDIMM support, which is a positive.

The PCIe slots are physically an x16 slot and an x8 slot, however there are only 16 lanes between the two and they act in a switching capacity, for x16/x0 or x8/x8 operation. For a system that has access to 32 PCIe lanes, looking through the specification sheet it is not overtly obvious where the other 16 from the SoC have gone. It would seem that in the interest of cost (or the specific customer), they are simply not used.

Elsewhere on the board is almost a full set of SATA ports. Three SATA ports are standard SATA ports, 1 is a SATA DOM, and eight SATA ports come from the two SFF-8643 breakout connectors. This totals twelve ports, although the platform supports 14. Similarly, there is a single USB 3.0 port, a single USB 2.0 port, and one header each for USB 2.0 and USB 3.0. On the rear panel there is an Ethernet port for the integrated management chip, an Aspeed AST2500, along with a VGA header for management as well. Other networking and USB ports have to be added in by the customer.  There are also five fan headers.

ASRock Rack does not list the exact processors that it will put into this motherboard, stating that it supports up to 110W, which would cover the full list. It is likely that interested parties will have to inquire as to exact pricing depending on the D-2100 series needed.

Many thanks to SH SOTN for the link.



Log in

Don't have an account? Sign up now