Original Link: https://www.anandtech.com/show/18732/asrock-industrial-nucs-box1360pd4-review-raptor-lakep-ecc



Low-power processors have traditionally been geared towards notebooks and other portable platforms. However, the continued popularity of ultra-compact form-factor desktop systems has resulted in UCFF PCs also serving as lead vehicles for the latest mobile processors. Such is the case with Intel's Raptor Lake-P - the processor SKUs were announced earlier this month at the 2023 CES, and end-products using the processor were slated to appear in a few weeks. Intel is officially allowing its partners to start selling their products into the channel today, and also allowing third-party evaluation results of products based on Raptor Lake-P to be published.

ASRock Industrial introduced their Raptor Lake-P-based NUC clones as soon as Intel made the parts public. With the new platform, the company decided to trifurcate their offerings - a slim version (sans 2.5" drive support) with DDR4 SODIMM slots in the NUCS BOX-13x0P/D4, a regular height version with 2.5" drive support in the NUC BOX-13x0P/D4, and a slightly tweaked version of the latter with DDR5 SODIMM slots in the NUC BOX-13x0P/D5. The NUCS BOX-1360P is the company's flagship in the first category, and the relative maturity of DDR4-based platforms has allowed them to start pushing the product into the channel early.

ASRock Industrial sampled us with a NUCS BOX-1360P/D4 from their first production run. We expected a run of the mill upgrade with improvements in performance and power efficiency. In the course of the review process, we found that the system allowed control over a new / key Raptor Lake-P feature that Intel hadn't even bothered to bring out during their CES announcement - in-band ECC. This review provides a comprehensive look at Raptor Lake-P's feature set for desktop platforms along with with detailed performance and power efficiency analysis for SFF PC workloads.

Introduction and Product Impressions

Intel's Raptor Lake-P builds upon Alder Lake-P and its heterogeneous processor architecture by moving to a more efficient manufacturing process. Unlike the desktop version with improved cache size per performance core and doubled efficiency cores count, the -P series provides improvements essentially from the updated V-F curves. This has allowed Intel to increase the turbo clocks for both core types to deliver better performance and power efficiency - all while retaining the same 28W nominal TDP of Alder Lake-P. There are also a number of I/O improvements such as additional Thunderbolt 4 ports and USB 3.2 Gen 2x2 support on them, but the adoption of those are dependent on other board component choices by the system manufacturer.

ASRock is a well-known vendor in the consumer PC market. In 2011, the company set up the ASRock Industrial business unit to focus on industrial motherboards. The division branched out in 2018 as an independent vendor with exclusive focus on B2B products. The company has products for deployment in small businesses (offices), automation, robotics, security, and other industrial / IoT applications. Primarily, the company develops motherboards, and sells them to various system integrators who can do their own value additions. Additionally, the company also sells mini-PCs based on the developed motherboards into the retail channel. We have taken a close look at the performance profile of various ASRock Industrial UCFF PCs before, including that of the NUC BOX-1260P based on the Core i7-1260P Alder Lake-P processor.

The company provided us with a sample of their first Raptor Lake-P mini-PC - the NUCS BOX-1360P/D4. This is essentially a follow-up product to the NUC BOX-1200 series, but not exactly a drop-in replacement. While the NUC BOX-1200 series came with dual LAN capabilities, and a choice of both HDMI and Display Port outputs, the NUCS BOX-1300 series replaces the Display Port output with another HDMI port and does away with one of the LAN ports.

While the NUC BOX series uses a chassis tracing its roots back to ASRock's now defunct Beebox product line, the NUCS BOX is a first for ASRock Industrial. Without the need to support a 2.5" drive, the chassis height has been cut down from 48mm to 38mm. The original fingerprint magnet of a chassis top has also been replaced with matte plastic. The relative distance between the motherboard and the chassis top has not been altered, though (the height reduction is completely on the underside where the 2.5" drive caddy used to be placed). The cooling solution for the processor is time-tested within that case design, and all that ASRock Industrial has done for the NUCS BOX-1300 series is to alter the I/O cut-outs slightly to match the new board.

ASRock Industrial's main focus is on B2B customers. It is no surprise that their systems are packaged in a nondescript manner. However, within the package, the company includes everything that an end-user would need - a VESA mount and associated screws, M.2 SSD installation aids, a geo-specific power cord and a 90W power adapter.

ASRock Industrial markets their mini-PCs in a barebones configuration, with the choice of RAM and SSD left to the end user. Installing these components involves removing four screws from the underside of the unit and slotting in the SODIMMs and affixing the M.2 SSD with a screw. It must be noted here that the M.2 SSD installation in the NUCS BOX-1360P is much easier compared to the NUC BOX-1200 series as the screw slot is directly on the board and not on a separate plastic tab. The side of the chassis are perforated for air intake and the rear has the air vent that allows the laptop-style blower fan to exhaust air after passing it through the heat spreader.

In order to make an apples-to-apples comparison, we opted to utilize the same set of components used in our review of the NUC BOX-1260P. The NUCS BOX-1360P/D4 was equipped with an ADATA XPG GAMMIX S50 Lite and 2x 32GB of the Kingston FURY Impact DDR4-3200 SODIMMs. The full specifications of our review sample (as tested) are summarized in the table below.

ASRock NUCS BOX-1360P/D4 Specifications
(as tested)
Processor Intel Core i7-1360P
Raptor Lake 4P + 8E / 16T, up to 5.0 GHz (P) up to 3.7 GHz (E)
Intel 7, 18MB L2, Min / Max / Base TDP: 20W / 64W / 28W
PL1 = 28W, PL2 = 64W
Memory Kingston FURY Impact KHX3200C20S4/32GX DDR4-3200 SODIMM
20-22-22-48 @ 3200 MHz
2x32 GB
Graphics Intel Iris Xe Graphics
(96EU @ 1.50 GHz)
Disk Drive(s) ADATA XPG GAMMIX S50 Lite
(2 TB; M.2 2280 PCIe 4.0 x4 NVMe;)
(Micron 96L 3D TLC; Silicon Motion SM2267 Controller)
Networking 1x 2.5 GbE RJ-45 (Intel I226-LM)
Intel Wi-Fi 6E AX210 (2x2 802.11ax - 2.4 Gbps)
Audio Realtek ALC233 (3.5mm Audio Jack in Front)
Digital Audio with Bitstreaming Support over HDMI and Display Port
Video 2x HDMI 2.0b (Rear)
2x Display Port 1.4 over Type-C Alt-Mode
Miscellaneous I/O Ports 1x USB4 Type-C (Front, up to 40 Gbps)
1x USB 3.2 Gen 2 Type-C (Front, with DP Alt Mode)
2x USB 3.2 Gen 2 Type-A (Front)
2x USB 3.2 Gen 2 Type-A (Rear)
Operating System Windows 11 Enterprise (22000.1455)
Pricing (Street Pricing on February 7th, 2022)
US $691 (barebones)
US $1050 (as configured, no OS)
Full Specifications ASRock Industrial NUCS BOX-1360P/D4 Specifications

In the next section, we take a look at the BIOS options along with an analysis of the motherboard platform. It also includes an overview of the in-band ECC feature that Intel had played down at launch. Following that, we have a number of sections focusing on various performance aspects with and without in-band ECC before concluding with an analysis of the value proposition of the system.



Setup Notes: The In-Band ECC Option

Upon completion of the hardware configuration of the NUCS BOX-1360P/D4 and freshly installing the OS, we took some time to look into its BIOS interface. The video below present the entire gamut of available options for the system.

ASRock Industrial primarily plays in the B2B space, where a functional feature-rich BIOS is valued over a fancy GUI-based one with fewer knobs. In that context, it is not hugely surprising that the BIOS interface is spartan in nature. Since the system is primarily meant for business deployment, the control knobs mainly relate to the activation of specific CPU and chipset features, along with the peripherals. The BIOS home screen provides a quick overview of the system configuration - processor details, along with the DRAM configuration (including the memory controller speeds).

Under the Advanced > Chipset option, we found an entry that was not present in the BIOS of the NUC BOX-1260P. By default, this 'In-Band ECC Support' was disabled. The constraint on enabling it was only related to both SODIMM slots having similar memory modules installed.

Since we had configured the SODIMM slots in a symmetric manner, toggling this option and booting into the OS was a straightforward affair. The screenshots below show the impact of this option on the memory information as reported by Windows Task Manager - the default settings on the left, and the ECC enabled version on the right.

It can be noted that the amount of hardware reserved memory in the 'In-Band ECC'-enabled case is 2GB higher than the default case. This points to 1/32 of the total memory capacity being reserved for ECC storage. At this juncture, an overview of in-band ECC is warranted.

In-Band ECC

Error-correction code (ECC) memory is typically used in applications where memory integrity is of paramount importance. Typical ECC memory associates a bit pattern with every block of data that can be used to ensure that the block did not experience any corruption when it was resident in external memory. In general, this feature has been restricted to high-end workstation and server systems. There are many ways to implement this type of memory protection. The most commonly used scheme involves the external memory having additional pins transmitting the protection bit pattern (ECC) along with the main data. Thus, instead of a 64-bit memory interface, the SoC / processor and the memory chips would have a (64 + 8)-bit interface, with the 8-bit interface for the ECC. The cited example of a 72-bit word with 64 data bits (block size) and 8 check bits (ECC) is most commonly used for single-bit error correction / double-bit error detection (SECDED), but other variants are also possible.

Going into the details of how ECC works and helps in SECDED are beyond the scope of this review (interested readers can start going down the rabbit hole here, in case one is not already familiar with them). If performance is of paramount importance, the sideband (extra bits in the bus between the processor and external memory) scheme is helpful because the ECC computation and checks can be carried out in parallel with the main data transfer. However, using sideband signals is not only wasteful of board real estate and component sizes, but also imposes power and cost penalties. In space-constrained systems and other applications where sacrificing a bit of performance is an acceptable trade-off for memory protection, architects have come up with the concept of in-band ECC.

'In-Band' refers to the fact that the ECC is stored within the same memory space as the main data (by reserving an address range, and disallowing its use by the memory clients inside the SoC / processor). In simple terms, whenever data is written out to the external memory, the ECC corresponding to it is also written out to a corresponding reserved address. Whenever data is read from the external memory, the corresponding ECC is also read back, and the memory controller inside the SoC / processor does the data integrity check as required. This scheme is not performance-friendly if operated with the same 64-bit data / 8-bit ECC granularity used in sideband configurations, as the effective memory bandwidth would get cut down by more than a factor of two. Instead, most in-band ECC schemes operate with data block sizes equivalent to the burst size of the external memory. DRAM is accessed in sets of cycles (termed as the burst length - BL). With BL8, and a 64-bit memory bus, each access set would be to 64 bytes (512-bits). I am hugely oversimplifying things here, but readers should be able to catch the drift. Now, SECDED for 512 bits can be achieved with 16 ECC bits. There is an additional complication here because accessing the external memory for reading and writing 2 bytes is highly wasteful (remember the BL8 / 64-byte access set). To improve memory bandwidth utilization for ECC accesses, the memory controller includes an 'ECC cache' where these ECC values are stored (and preferably flushed out only if they can be bunched together in a single write burst). Similar to any caching scheme, this can improve bandwidth utilization but can't always be guaranteed to avoid inefficiencies. Sometimes, it may be necessary to perform read-modify-writes to the external memory, and this can bring down overall memory bandwidth utilization. Intel's 2019 patent filing provides more detailed technical insights into the likely architecture of the in-band ECC block in the memory controller.

In-band ECC started appearing in Intel's processors recently in specific embedded Tiger Lake-U and Elkhart Lake parts. In fact, we did review Supermicro's SYS-E100-12T-H that included a TGL-UE processor with the capabilities, but the BIOS didn't allow control over this in-band ECC setting. Seeing the feature re-appear unannounced in Raptor Lake-P was a pleasant surprise, as it finally provided us with the opportunity to evaluate the performance impact of in-band ECC (something we were unable to do in the SYS-E100-12T-H review). Since Intel hadn't talked about this feature in their CES launch, we reached out to them regarding the official line on in-band ECC support for Raptor Lake-P. The official response, quoted verbatim:

In-band ECC is supported for Chrome designs but not Windows designs. Windows designs use in-band ECC for debug purposes only to identify failures in memory.

In order to figure out the impacts of activating in-band ECC, we processed our evaluation routine on the NUCS BOX-1360P/D4 twice - once with in-band ECC disabled, and once with it activated. The next few sections details the comparative benchmarks for the two configurations (and also includes a host of other UCFF systems to provide additional insights). Prior to that, a brief analysis of the platform is warranted.

Platform Analysis

The block diagram below presents the overall high-speed I/O distribution of the motherboard in the NUCS BOX-1360P/D4.

The architecture is similar to that of the NUC BOX-1260P despite the dropping of the second LAN port and the replacement of one of the Display Port outputs with HDMI. It must be noted that the retimer used in the Thunderbolt port path is still the same Burnside Bridge used in the NUC BOX-1200 series - this means that we don't get USB 3.2 Gen 2x2 support that could have provided 20 Gbps support in addition to the regular 40 Gbps Thunderbolt 4 support.

Comparative PC Configurations
Aspect ASRock NUCS BOX-1360P-D4
CPU Intel Core i7-1360P
Raptor Lake 4P + 8E / 16T, up to 5.0 GHz (P) up to 3.7 GHz (E)
Intel 7, 18MB L2, Min / Max / Base TDP: 20W / 64W / 28W
PL1 = 28W, PL2 = 64W
Intel Core i7-1360P
Raptor Lake 4P + 8E / 16T, up to 5.0 GHz (P) up to 3.7 GHz (E)
Intel 7, 18MB L2, Min / Max / Base TDP: 20W / 64W / 28W
PL1 = 28W, PL2 = 64W
GPU Intel Iris Xe Graphics
(96EU @ 1.50 GHz)
Intel Iris Xe Graphics
(96EU @ 1.50 GHz)
RAM Kingston FURY Impact KHX3200C20S4/32GX DDR4-3200 SODIMM
20-22-22-48 @ 3200 MHz
2x32 GB
Kingston FURY Impact KHX3200C20S4/32GX DDR4-3200 SODIMM
20-22-22-48 @ 3200 MHz
2x32 GB
Storage ADATA XPG GAMMIX S50 Lite
(2 TB; M.2 2280 PCIe 4.0 x4 NVMe;)
(Micron 96L 3D TLC; Silicon Motion SM2267 Controller)
ADATA XPG GAMMIX S50 Lite
(2 TB; M.2 2280 PCIe 4.0 x4 NVMe;)
(Micron 96L 3D TLC; Silicon Motion SM2267 Controller)
Wi-Fi 1x 2.5 GbE RJ-45 (Intel I226-LM)
Intel Wi-Fi 6E AX210 (2x2 802.11ax - 2.4 Gbps)
1x 2.5 GbE RJ-45 (Intel I226-LM)
Intel Wi-Fi 6E AX210 (2x2 802.11ax - 2.4 Gbps)
Price (in USD, when built) (Street Pricing on January 25th, 2022)
US $700 (barebones)
US $1050 (as configured, no OS)
(Street Pricing on January 25th, 2022)
US $700 (barebones)
US $1050 (as configured, no OS)

The rest of this review deals with the comparative benchmark numbers for the UCFF systems outlined in the table above. All of the systems are based on 4x4 motherboards, though the PL1 and PL2 configurations vary.



System Performance: UL and BAPCo Benchmarks

Our 2022 Q4 update to the test suite for Windows 11-based systems carries over some of the standard benchmarks we have been using over the last several years, including UL's PCMark. New additions include BAPCo's CrossMark multi-platform benchmarking tool, as well as UL's Procyon benchmark suite. BPACo recently updated their SYSmark benchmark suite - while operational at a basic level, it is missing key features such as energy consumption measurement. We will start including SYSmark 30 once the open issues are resolved.

UL PCMark 10

UL's PCMark 10 evaluates computing systems for various usage scenarios (generic / essential tasks such as web browsing and starting up applications, productivity tasks such as editing spreadsheets and documents, gaming, and digital content creation). We benchmarked select PCs with the PCMark 10 Extended profile and recorded the scores for various scenarios. These scores are heavily influenced by the CPU and GPU in the system, though the RAM and storage device also play a part. The power plan was set to Balanced for all the PCs while processing the PCMark 10 benchmark. The scores for each contributing component / use-case environment are also graphed below.

UL PCMark 10 - Performance Scores

The Productivity workload benefits from the 8 high-performance Zen 3 cores in the 4X4 BOX-5800U, but the Intel systems wrest the lead in other components. Overall, the RPL-P system at default comes out on top. However, with in-band ECC enabled, the gaming workload suffers greatly. This sees the overall score for the ECC-enabled configuration to be in the middle of the pack.

UL Procyon v2.1.544

PCMark 10 utilizes open-source software such as Libre Office and GIMP to evaluate system performance. However, many of their professional benchmark customers have been requesting evaluation with commonly-used commercial software such as Microsoft Office and Adobe applications. In order to serve their needs, UL introduced the Procyon benchmark in late 2020. There are five benchmark categories currently - Office Productivity, AI Inference, Battery Life, Photo Editing, and Video Editing. AI Inference benchmarks are available only for Android devices, while the battery life benchmark is applicable to Windows devices such as notebooks and tablets. We presents results from our processing of the other three benchmarks.

UL Procyon - Office Productivity Scores

Enabling in-band ECC results in some penalty, but the scores across all MS Office workloads for both NUCS BOX-1360P/D4 configurations handily surpass the other systems.

The NUC 12 Pro Wall Street Canyon and the 4x4 BOX-5800U come out as the best bet for energy efficiency with respect to the MS Office workloads. The NUCS BOX-1360P/D4 gets a much higher score, but is let down on the energy consumption side. While not an entirely scientific metric, the Wall Street Canyon configuration delivers 733 pts / Wh, while the NUCS BOX does only 699 pts / Wh in this workload. However, we have historically seen that ASRock Industrial's BIOS for the NUC BOX lineup is rarely optimized for power consumption.

Moving on to the evaluation of Adobe Photoshop and Adobe Lightroom, we find the same pattern. Despite the 40W PL1 of the NUC 12 Pro, the 28W PL1 Core i7-1360P in the NUCS BOX is able to match it and also consume lesser energy for the workload. Enabling ECC pushes down the system towards the middle of the pack in both metrics.

UL Procyon - Photo Editing

UL Procyon evaluates performance for video editing using Adobe Premier Pro.

UL Procyon - Photo Editing

The workload takes advantage of the iGPU, which is problematic for the ECC-enabled configuration. However, the default configuration is able to come out on top by a huge margin. Being able to complete the workload faster also keeps the energy consumption low.

BAPCo CrossMark 1.0.1.86

BAPCo's CrossMark aims to simplify benchmark processing while still delivering scores that roughly tally with SYSmark. The main advantage is the cross-platform nature of the tool - allowing it to be run on smartphones and tablets as well.

BAPCo CrossMark 1.0.1.86 - Sub-Category Scores

CrossMark shows the NUCS BOX-1360P/D4 emerging on top, with the ECC-enabled configuration performing roughly on par with the NUC BOX-1260P.



System Performance: Miscellaneous Workloads

Standardized benchmarks such as UL's PCMark 10 and BAPCo's SYSmark take a holistic view of the system and process a wide range of workloads to arrive at a single score. Some systems are required to excel at specific tasks - so it is often helpful to see how a computer performs in specific scenarios such as rendering, transcoding, JavaScript execution (web browsing), etc. This section presents focused benchmark numbers for specific application scenarios.

3D Rendering - CINEBENCH R23

We use CINEBENCH R23 for 3D rendering evaluation. R23 provides two benchmark modes - single threaded and multi-threaded. Evaluation of different PC configurations in both supported modes provided us the following results.

3D Rendering - CINEBENCH R23 - Single Thread

3D Rendering - CINEBENCH R23 - Multiple Threads

It appears that enabling ECC has negligible effect on rendering performance. In the single-threaded case, the 28W PL1 Core i7-1360P in the new system performs roughly on par with the 40W PL1 Core i7-1260P in the Wall Street Canyon NUC. However, the lagging PL1 is a liability in the multi-threaded case, allowing both the Wall Street Canyon NUC and the 4x4 BOX-5800U to take a huge lead.

Transcoding: Handbrake 1.5.1

Handbrake is one of the most user-friendly open source transcoding front-ends in the market. It allows users to opt for either software-based higher quality processing or hardware-based fast processing in their transcoding jobs. Our new test suite uses the 'Tears of Steel' 4K AVC video as input and transcodes it with a quality setting of 19 to create a 720p AVC stream and a 1080p HEVC stream.

Transcoding - x264

Transcoding - x265_10bit

The relative ordering seen in the Cinebench multi-threading case is also seen in the case of x264 and x265 encoding for the same reason. The 28W PL1 is a downer for long-running tasks in the NUCS BOX-1360P/D4.

Transcoding - QuickSync H.264

Transcoding - QuickSync H.265 10bit

In the QuickSync case, which is purely a reflection of the iGPU clock speeds, the Wall Street Canyon NUC with higher PL1 is able to maintain faster clocks compared to the NUCS BOX, despite the latter having higher iGPU speeds on paper. Enabling ECC causes the frame rate to slip, only to be expected based on previous results for GPU-heavy benchmarks.

Archiving: 7-Zip 21.7

The 7-Zip benchmark is carried over from our previous test suite with an update to the latest version of the open source compression / decompression software.

7-Zip Compression Rate

7-Zip Decompression Rate

Higher power budgets and core counts matter in this test - so, the trend observed in the rendering and transcoding tests hold true here. ECC seems to negatively impact the compression rate, possibly due to the triggering of a large number of unaligned accesses to the external memory.

Web Browsing: JetStream, Speedometer, and Principled Technologies WebXPRT4

Web browser-based workloads have emerged as a major component of the typical home and business PC usage scenarios. For headless systems, many applications based on JavaScript are becoming relevant too. In order to evaluate systems for their JavaScript execution efficiency, we are carrying over the browser-focused benchmarks from the WebKit developers used in our notebook reviews. Hosted at BrowserBench, JetStream 2.0 benchmarks JavaScript and WebAssembly performance, while Speedometer measures web application responsiveness.

BrowserBench - Jetstream 2.0

BrowserBench - Speedometer 2.0

From a real-life workload perspective, we also process WebXPRT4 from Principled Technologies. WebXPRT4 benchmarks the performance of some popular JavaScript libraries that are widely used in websites.

Principled Technologies WebXPRT4

Single-threaded performance matters heavily in browser benchmarks. Here, the improvements in Raptor Lake-P come to fore. Even with ECC enabled, the NUCS BOX system is able to surpass the performance of the Wall Street Canyon with a higher PL1.

Application Startup: GIMP 2.10.30

A new addition to our systems test suite is AppTimer - a benchmark that loads up a program and determines how long it takes for it to accept user inputs. We use GIMP 2.10.30 with a 50MB multi-layered xcf file as input. What we test here is the first run as well as the cached run - normally on the first time a user loads the GIMP package from a fresh install, the system has to configure a few dozen files that remain optimized on subsequent opening. For our test we delete those configured optimized files in order to force a fresh load every second time the software is run.

AppTimer: GIMP 2.10.30 Startup

The 'cached start' situation is a win for the NUCS BOX, but the system suffers in the 'cold start' scenario. Based on the relative ordering of the system, the processor architecture generation and PL1 configuration appear to be the likely affecting factors.



GPU Performance: Synthetic Benchmarks

Intel did not make significant changes in the integrated GPU when moving from Alder Lake to Raptor Lake. Process maturity has allowed it to clock the iGPU a bit higher, but the number of EUs remains the same as in the previous generation. GPU performance evaluation typically involved gaming workloads, and for select PCs, GPU compute. Prior to that, we wanted to take a look at the capabilities of the iGPU in the Core i7-1360P. Unfortunately, GPU-Z doesn't yet recognize the 'new' GPU, but HWiNFO has more helpful information.


We have seen earlier that the performance of the Intel Iris Xe Graphics is miles ahead of previous iGPUs from both Intel and AMD. The benchmarks processed on the NUCS BOX-1360P/D4 back up that aspect.

GFXBench

The DirectX 12-based GFXBench tests from Kishonti are cross-platform, and available all the way down to smartphones. As such, they are not very taxing for discrete GPUs and modern integrated GPUs. We processed the offscreen versions of the 'Aztec Ruins' benchmark.

GFXBench 5.0: Aztec Ruins Normal 1080p Offscreen

GFXBench 5.0: Aztec Ruins High 1440p Offscreen

Raptor Lake-P's higher iGPU clocks enable it to come out on top, but enabling ECC makes the performance suffer badly.

UL 3DMark

Four different workload sets were processed in 3DMark - Fire Strike, Time Spy, Night Raid, and Wild Life.

3DMark Fire Strike

The Fire Strike benchmark has three workloads. The base version is meant for high-performance gaming PCs. It uses DirectX 11 (feature level 11) to render frames at 1920 x 1080. The Extreme version targets 1440p gaming requirements, while the Ultra version targets 4K gaming system, and renders at 3840 x 2160. The graph below presents the overall score for the Fire Strike Extreme and Fire Strike Ultra benchmark across all the systems that are being compared.

UL 3DMark - Fire Strike Workloads

3DMark Time Spy

The Time Spy workload has two levels with different complexities. Both use DirectX 12 (feature level 11). However, the plain version targets high-performance gaming PCs with a 2560 x 1440 render resolution, while the Extreme version renders at 3840 x 2160 resolution. The graphs below present both numbers for all the systems that are being compared in this review.

UL 3DMark - Time Spy Workloads

3DMark Wild Life

The Wild Life workload was initially introduced as a cross-platform GPU benchmark in 2020. It renders at a 2560 x 1440 resolution using Vulkan 1.1 APIs on Windows. It is a relatively short-running test, reflective of mobile GPU usage. In mid-2021, UL released the Wild Life Extreme workload that was a more demanding version that renders at 3840 x 2160 and runs for a much longer duration reflective of typical desktop gaming usage.

UL 3DMark - Wild Life Workloads

3DMark Night Raid

The Night Raid workload is a DirectX 12 benchmark test. It is less demanding than Time Spy, and is optimized for integrated graphics. The graph below presents the overall score in this workload for different system configurations.

UL 3DMark Fire Strike Extreme Score

The Wall Street Canyon NUC and the NUCS BOX-1360P/D4 are pretty much neck-to-neck in the 3D Mark workloads. In these benchmarks that run relatively longer than GFXBench, the PL1 also starts coming into the picture. The Wall Street Canyon NUC has an edge in that aspect. Other than that, it is no surprise that the pattern of external memory accesses generated in the 3D Mark workloads is detrimental to performance when ECC is enabled.



System Performance: Multi-Tasking

One of the key drivers of advancements in computing systems is multi-tasking. On mobile devices, this is quite lightweight - cases such as background email checks while the user is playing a mobile game are quite common. Towards optimizing user experience in those types of scenarios, mobile SoC manufacturers started integrating heterogeneous CPU cores - some with high performance for demanding workloads, while others were frugal in terms of both power consumption / die area and performance. This trend is now slowly making its way into the desktop PC space.

Multi-tasking in typical PC usage is much more demanding compared to phones and tablets. Desktop OSes allow users to launch and utilize a large number of demanding programs simultaneously. Responsiveness is dictated largely by the OS scheduler allowing different tasks to move to the background. Intel's Alder Lake processors work closely with the Windows 11 thread scheduler to optimize performance in these cases. Keeping these aspects in mind, the evaluation of multi-tasking performance is an interesting subject to tackle.

We have augmented our systems benchmarking suite to quantitatively analyze the multi-tasking performance of various platforms. The evaluation involves triggering a ffmpeg transcoding task to transform 1716 3840x1714 frames encoded as a 24fps AVC video (Blender Project's 'Tears of Steel' 4K version) into a 1080p HEVC version in a loop. The transcoding rate is monitored continuously. One complete transcoding pass is allowed to complete before starting the first multi-tasking workload - the PCMark 10 Extended bench suite. A comparative view of the PCMark 10 scores for various scenarios is presented in the graphs below. Also available for concurrent viewing are scores in the normal case where the benchmark was processed without any concurrent load, and a graph presenting the loss in performance.

UL PCMark 10 Load Testing - Digital Content Creation Scores

UL PCMark 10 Load Testing - Productivity Scores

UL PCMark 10 Load Testing - Essentials Scores

UL PCMark 10 Load Testing - Gaming Scores

UL PCMark 10 Load Testing - Overall Scores

Following the completion of the PCMark 10 benchmark, a short delay is introduced prior to the processing of Principled Technologies WebXPRT4 on MS Edge. Similar to the PCMark 10 results presentation, the graph below show the scores recorded with the transcoding load active. Available for comparison are the dedicated CPU power scores and a measure of the performance loss.

Principled Technologies WebXPRT4 Load Testing Scores (MS Edge)

The final workload tested as part of the multitasking evaluation routine is CINEBENCH R23.

3D Rendering - CINEBENCH R23 Load Testing - Single Thread Score

3D Rendering - CINEBENCH R23 Load Testing - Multiple Thread Score

After the completion of all the workloads, we let the transcoding routine run to completion. The monitored transcoding rate throughout the above evaluation routine (in terms of frames per second) is graphed below.

ffmpeg Transcoding Rate and Processor Usage

Across all the different workloads, we actually find the ASRock Industrial NUC(S) BOX systems having significant drop in performance compared to similar UCFF systems. It leads one to suspect that Thread Director is simply not able to do the appropriate thread allocation in the systems. Whether this is related to any BIOS configuration is something for the company to look into.

ASRock NUCS BOX-1360P/D4 ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 2 9.6 43.5
PCMark 10 0 8.37 31.5
WebXPRT 4 2.5 9.18 18
Cinebench R23 0.5 8.4 29.5
Transcode End Pass 2 9.51 30.5
ASRock NUCS BOX-1360P/D4 (In-Band ECC) ffmpeg Transcoding Rate (Multi-Tasking Test)
Task Segment Transcoding Rate (FPS)
Minimum Average Maximum
Transcode Start Pass 1.5 9.28 39.5
PCMark 10 0 8.03 27.5
WebXPRT 4 2 8.9 17.5
Cinebench R23 0.5 8.16 27
Transcode End Pass 1.5 9.21 29

On the positive side, the drop in transcoding frame rate for the NUCS BOX configurations is not as heave as what was seen for other systems.



HTPC Credentials

The 2022 Q4 update to our system reviews brings an updated HTPC evaluation suite for systems. After doing away with the evaluation of display refresh rate stability and Netflix streaming evaluation, the local media playback configurations have also seen a revamp. This section details each of the workloads processed on the ASRock NUCS BOX-1360P-D4 as part of the HTPC suite.

YouTube Streaming Efficiency

YouTube continues to remain one of the top OTT platforms, primarily due to its free ad-supported tier. Our HTPC test suite update retains YouTube streaming efficiency evaluation as a metric of OTT support in different systems. Mystery Box's Peru 8K HDR 60FPS video is the chosen test sample. On PCs running Windows, it is recommended that HDR streaming videos be viewed using the Microsoft Edge browser after putting the desktop in HDR mode.

YouTube Streaming Statistics

The GPU in ASRock NUCS BOX-1360P-D4 supports hardware decoding of VP9 Profile 2, and we see the stream encoded with that codec being played back. The streaming is perfect, thanks to the powerful GPU and hardware decoding support - the couple of dropped frames observed in the statistics below are due to mouse clicks involved in bringing up the overlay.

The streaming efficiency-related aspects such as GPU usage and at-wall power consumption are also graphed below.

YouTube Streaming Efficiency

 

Interestingly, we see both decoder usage and D3D usage going up after enabling ECC. There were no visible dropped frames in the ECC case except during the activation of the OSD overlays. The higher power consumption numbers also contribute to the dismal energy efficiency of the ECC configuration.

Hardware-Accelerated Encoding and Decoding

The transcoding benchmarks in the systems performance section presented results from evaluating the QuickSync encoder within Handbrake's framework. The capabilities of the decoder engine are brought out by DXVAChecker.


Video Decoding Hardware Acceleration in ASRock NUCS BOX-1360P-D4

The iGPU in Raptor Lake-P system supports hardware decode for a variety of codecs including AVC, JPEG, HEVC (8b and 10b, 4:2:0 and 4:4:4), and VP9 (8b and 10b, 4:2:0 and 4:4:4). AV1 decode support is also present. This is currently the most comprehensive codec support seen in the PC space.

Local Media Playback

Evaluation of local media playback and video processing is done by playing back files encompassing a range of relevant codecs, containers, resolutions, and frame rates. A note of the efficiency is also made by tracking GPU usage and power consumption of the system at the wall. Users have their own preference for the playback software / decoder / renderer, and our aim is to have numbers representative of commonly encountered scenarios. Our Q4 2022 test suite update replaces MPC-HC (in LAV filters / madVR modes) with mpv. In addition to being cross-platform and open-source, the player allows easy control via the command-line to enable different shader-based post-processing algorithms. From a benchmarking perspective, the more attractive aspect is the real-time reporting of dropped frames in an easily parseable manner. The players / configurations considered in this subsection include:

  • VLC 3.0.18
  • Kodi 20.0b1
  • mpv 0.35 (hwdec auto, vo=gpu-next)
  • mpv 0.35 (hwdec auto, vo=gpu-next, profile=gpu-hq)

Fourteen test streams (each of 90s duration) were played back from the local disk with an interval of 30 seconds in-between. Various metrics including GPU usage, at-wall power consumption, and total energy consumption were recorded during the course of this playback.

All our playback tests were done with the desktop HDR setting turned on. It is possible for certain system configurations to automatically turn on/off the HDR capabilities prior to the playback of a HDR video, but, we didn't take advantage of that in our testing.

VLC Playback Efficiency

While playback was perfect for all codecs except AV1 (the CPU is not strong enough for software-only 8Kp60 decoding), the power consumption numbers are off a relatively high idle base. This results in the workload energy consumption being in the lower half of the pack for both configurations.

Kodi Playback Efficiency

The scenario seen with VLC is replicated in Kodi also, with the high idle power consumption base driving up the energy numbers even though the delta is quite reasonable.

mpv (Default) Playback Efficiency

mpv playback with the gpu-next video output driver is the most energy efficient of the lot. We also have hardware accelerated decode for AV1. However, the playback for that clip still has issues, with approximately 60% of the frames getting dropped in the video output (the decoder itself doesn't drop any frames).

This may warrant investigation by the mpv / gpu-next developers and/or Intel's driver team. It does appear to be a software issue that can be resolved in the long run.

mpv (GPU-HQ) Playback Efficiency

Activating the GPU shaders for video post processing does result in increased energy consumption, but there are no dropped frames. The 8Kp60 AV1 decode video output issue remains the same irrespective of the profile used.



Power Consumption and Thermal Characteristics

The power consumption at the wall was measured with a 4K display being driven through the HDMI port of the system. In the graph below, we compare the idle and load power of the ASRock NUCS BOX-1360P/D4 with other systems evaluated before. For load power consumption, we ran the AIDA64 System Stability Test with various stress components, as well as our custom stress test with Prime95 / Furmark, and noted the peak as well as idling power consumption at the wall.

Power Consumption

The numbers are consistent with the TDP and suggested PL1 / PL2 values for the processors in the systems, and do not come as any surprise. The load number is affected by the PL2 value (64W for the Core i7-1360P in the NUCS BOX-1360P/D4). The idling numbers are very disappointing across all of the ASRock Industrial systems in the above graph.

Stress Testing

Our thermal stress routine is a combination of Prime95, Furmark, and Finalwire's AIDA64 System Stability Test. The following 9-step sequence is followed, starting with the system at idle:

  • Start with the Prime95 stress test configured for maximum power consumption
  • After 30 minutes, add Furmark GPU stress workload
  • After 30 minutes, terminate the Prime95 workload
  • After 30 minutes, terminate the Furmark workload and let the system idle
  • After 30 minutes of idling, start the AIDA64 System Stress Test (SST) with CPU, caches, and RAM activated
  • After 30 minutes, terminate the previous AIDA64 SST and start a new one with the GPU, CPU, caches, and RAM activated
  • After 30 minutes, terminate the previous AIDA64 SST and start a new one with only the GPU activated
  • After 30 minutes, terminate the previous AIDA64 SST and start a new one with the CPU, GPU, caches, RAM, and SSD activated
  • After 30 minutes, terminate the AIDA64 SST and let the system idle for 30 minutes

Traditionally, this test used to record the clock frequencies - however, with the increasing number of cores in modern processors and fine-grained clock control, frequency information makes the graphs cluttered and doesn't contribute much to understanding the thermal performance of the system. The focus is now on the power consumption and temperature profiles to determine if throttling is in play.

Custom Stress Test - Power Consumption Profile

The cooling solution for the processor package is effective, and the package power doesn't dip below the PL1 value of 28W throughout the duration in which it is stressed. The iGPU alone seems to have a power budget of around 20W.

Custom Stress Test - Temperature Profile

On the temperature front, the package remains below 90C and there is no throttling from a package power consumption perspective.The SSD temperature is a bit worrisome, reaching as high as 80C even when it is not subject to active stress.



Miscellaneous Aspects and Concluding Remarks

Networking and storage are aspects that may be of vital importance in specific PC use-cases. The ASRock NUCS BOX-1360P/D4 comes with the Wi-Fi 6E AX210 WLAN card that also include Bluetooth 5.2 support. On the wired front, we have a 2.5 Gbps port backed by the Intel I226-LM controller. A dual LAN option in this form-factor would have been nice to have, but consumers needing that can always go in for the full-height NUC BOX-1360P/D4. Strangely, the vPro support available in the NUC BOX-1260P seems to be absent here.

On the storage side, the NUCS BOX-1360P does have support for a PCIe 4.0 x4 NVMe SSDs (and we used one in our configuration). However, cooling those within the space constraints imposed by the form-factor of the slim NUC is very challenging, as we saw in the SSD temperature graph in the previous section. In the absence of an effective thermal solution, it might be a better option to stick with a PCIe 3.0 x4 NVMe SSD for this unit. From a benchmarking perspective, we provide results from the WPCstorage test of SPECworkstation 3.1. This benchmark replays access traces from various programs used in different verticals and compares the score against the one obtained with a 2017 SanDisk 512GB SATA SSD in the SPECworkstation 3.1 reference system.

SPECworkstation 3.1.0 - WPCstorage SPEC Ratio Scores

The graphs above present results for different verticals, as grouped by SPECworkstation 3.1. The storage workload consists of 60 subtests. Access traces from CFD solvers and programs such as Catia, Creo, and Soidworks come under 'Product Development'. Storage access traces from the NAMD and LAMMPS molecular dynamics simulator are under the 'Life Sciences' category. 'General Operations' includes access traces from 7-Zip and Mozilla programs. The 'Energy' category replays traces from the energy-02 SPECviewperf workload. The 'Media and Entertainment' vertical includes Handbrake, Maya, and 3dsmax. The gulf between the same SSD in the NUC BOX-1260P and the NUCS BOX-1360P would have been surprising if not for the fact that the new form-factor lacks airflow and cooling support for the SSD.

In pursuit of more investigation into the in-band ECC feature, I dug up one of the DDR4 SODIMMs that I had junked earlier for failing MemTest. Placing that in the system and running MemTest again on it delivered interesting results based on the chosen in-band ECC option. The following screenshots are from the display output recorded via a HDMI capture card - the timestamps in the video capture are to be noted.

PassMark MemTest Processing on Faulty SODIMM

With the default settings (in-band ECC turned off), the memory test reports failure in due course. On the other hand, with in-band ECC activated, MemTest ends up hanging - most likely after encountering the first memory error (before the testing program can get access to the faulty read data, the memory / ECC handling controller has detected uncorrectable errors and triggered an interrupt - which MemTest is probably unable to handle).

It appears that the in-band ECC has been present since Tiger Lake in a variety of processors. While its availability in the embedded processors line is clearly advertised on ark (example), its presence in the desktop (12th Gen. / 13th Gen.) and mobile (PDF) product line is buried under layers of documentation and not advertised widely. The murky aspect here is that error correction with standard DRAM is specified to be supported only in Chrome systems. It is not clear what is specific to Chrome (and not available or implementable in Windows and Linux) that enables it to take advantage of the in-band ECC feature. This is perhaps something worthy of follow-up by the open-source community, as the feature is of considerable interest to NAS operating systems.

Closing Thoughts

The ASRock NUCS BOX-1360P/D4 provided us with the opportunity to evaluate one of the first end products based on Intel's latest Raptor Lake-P platform. The product fulfills all our basic expectations from ASRock Industrial - we have seen that their UCFF systems run in a stable manner and provide good performance without much of an attempt at overall power consumption optimization. That trend continues with the NUCS BOX-1360P/D4. This approach has allowed ASRock Industrial to attract early adopters, and at the same time also target use-cases where the idle power consumption aspect is not much of a concern. The NUCS BOX-1360P/D4 is a worthy upgrade to the NUC BOX-1260P in terms of performance. On the I/O front, the trifurcation of the product line by ASRock Industrial has resulted in the need to wait for the other NUC BOX-1300 models if dual LAN and USB 3.2 Gen 2x2 capabilities are desired. Compared to their previous UCFF systems, the NUCS BOX-1300 series has a reduced height, and this does affect the thermal situation of the M.2 SSD. Hopefully, ASRock Industrial can address this with a solution similar to the one employed by Intel in their slim NUC kits.

Intel's Raptor Lake-P is quite close to Alder Lake-P architecturally. The core counts, cache sizes, and heterogeneous combinations are pretty much equivalent. Under such circumstances, it is indeed surprising that just process advancements have enabled Raptor Lake-P to provide satisfactory improvements in performance as well as power efficiency over Alder Lake-P. We are able to reach this conclusion despite evaluating a system that has not yet been optimized for power based on the energy consumption numbers for key UCFF PC workloads. A comparison of the power consumption profile of the NUC BOX-1260P and the equivalent Wall Street Canyon NUC shows that ASRock Industrial leaves plenty on the table for further optimization.

The expected generational improvements aside, it is heartening to see in-band ECC support getting more visibility. The ability to improve upon memory integrity without having to spring for special types of RAM or additional board area / traces can be a major selling point in a variety of consumer applications also. We are hopeful that Intel will not restrict it to industrial and embedded SKUs alone. On Raptor Lake-P specifically, we hope Intel will help enable it on operating systems beyond just Chrome OS. Obviously, there is no free lunch, and we do see some loss in performance for specific workloads (such as ones involving heavy iGPU activity). In situations requiring memory protection, the delta is not big enough to be a deal-breaker.

ASRock Industrial plans to market the NUCS BOX-1360P/D4 at a price point similar to the NUC BOX-1260P - $700 (Update: USD 691 at launch on February 7, 2023). Given the performance improvements over the previous generation NUC BOX-1260P as well as other UCFF systems, and the new features enabled in Raptor Lake-P, the pricing is justified. We look forward to the company optimizing the BIOS to address the idle power consumption issue, and tag on an appropriate thermal solution for the M.2 SSD. Those minor quibbles aside, the product is on completely solid ground in terms of both price and performance.

Log in

Don't have an account? Sign up now