2 short questions: - what happened to the plextor M9Pe, performance is hugely different from the review back in march. - i know this is already the case for a year or so, but what happened to the perf consistency graphs, where can i deduce the same information from?
I don't recall the details, but something went wrong with generating the performance consistency data, and they were pulled pending finding a fix due to concerns they were no longer valid. IF you have the patience to dig through the archive, IIRC the situation was explained in the first review without them.
I think both of those are a result of me switching to a new version of the test suite at the same time that I applied the Spectre/Meltdown patches and re-tested everything. The Windows and Linux installations were updated, and a few tweaks were made to the synthetic test configuration (such as separating the sequential read results according to whether the test data was written sequentially or randomly). I also applied all the drive firmware updates I could find in the April-May timeframe.
The steady-state random write test as it existed a few years ago is gone for good, because it really doesn't say anything relevant about drives that use SLC caching, which is now basically every consumer SSD (except Optane and Samsung MLC drives). I also wasn't too happy with the standard deviation-based consistency metric, because I don't think a drive should be penalized for occasionally being much faster than normal, only much slower than normal.
To judge performance consistency, I prefer to look at the 99th percentile latencies for the ATSB real-world workload traces. Those tend to clearly identify which drives are subject to stuttering performance under load, without exaggerating things as much as an hour-long steady-state torture test.
I may eventually introduce some more QoS measures for the synthetic tests, but at the moment most of them aren't set up to produce meaningful latency statistics. (Testing at a fixed queue depth leads to the coordinated omission problem, potentially drastically understating the severity of things like garbage collection pauses.) At some point I'll also start graphing the performance as a drive is filled, but with the intention of observing things like SLC cache sizes, not for the sake of seeing how the drive behaves when you keep torturing it after it's full.
I will be testing a few consumer SSDs for one of my upcoming enterprise SSD reviews, and that will include steady-state full drive performance for every test.
I've had trouble getting a sample of that one; Samsung's consumer SSD sampling has been very erratic this year. But the 970 Pro is definitely a different class of product from a mainstream TLC-based drive like the XG6. I would only include 970 Pro results here for the same reason that I include Optane results. They're both products for people who don't really care about price at all. There's no sensible reason to be considering a 970 Pro and an XG6-like retail drive as both potential choices for the same purchasing decision.
I second this. I know that I am (and feel most other savvy consumers would be) more likely to compare an older high-end product to a newer mid-range product, partly to see if it's worth buying the older gear at a discount and partly to see when there is no performance trade-off in dropping a cost tier.
As long as the tests are the same, you can always pull the comparisons up yourself in Bench.
While I sympathize with wanting them in the article tables, 3 or 6 years of historical low/mid/high end SSDs would end up either eating a lot of the tables reducing the number of current drives listed or making them much longer, so I fully understand why very little of that data is in the main tables.
DRAM buffer isn't mentioned but board has 4 chips on it, two are obviously flash chips, one is the Toshiba controller and one is by Nanya, a DRAM manufacturer. The kicker is that as an OEM part, the final customer has no way of telling if that chip is populated before purchase (and the lack of specs make it easier to leave it off).
Hopefully if these make it to the open market we can at least tell if they have the DRAM or not. Note that some of the cheaper NVMes (think ADATA XPG 6000) seem to do fine without DRAM, but they are priced to compete with SATA, not other NVMes.
Really? Then who took that photo? Is the board in the photo the board that you reviewed? That board clearly has this chip on it: http://www.nanya.com/en/Product/4228/NT6CL128M32CM... That's a 4Gb (512MB) LPDDR3 DRAM chip. Don't tell me that the board in the photograph doesn't have DRAM. They might not ship DRAM with the OEM devices, but that doesn't mean they didn't give you a SSD with DRAM to review.
He did not say "No, XG6-based OEM drive is going to be DRAMless", just "No XG6-based OEM drive is going to be DRAMless." i.e. none of these drives will be DRAMless.
My eyes are going. Should I go get a monitor with less dot pitch or get a mac where it doesn't force dot pitch to the monitor size? Decisions, decisions.
Commas are just to small for modern monitors. I was planing on getting higher dot pitch, but now I'm wondering.
I doubt a different monitor would help if your eyes are inserting punctuation where there is none - missing it when it's there is another matter. Besides, the fact that the sentence with an inserted comma doesn't add up grammatically should have tipped you off.
testing ssd performance on intel plaform is like testing race slicks tires on a child paddle car. Intel I/O performance went down by tens of percents with all the meltdown/spectre mitigations. please use AMD plaform instead
Usually they keep testing environments consistent for 1-2 years exactly due to such changing software conditions. It could well be that the next test suite will feature AMD CPUs and, as always, yield results not strictly comparable to the older ones.
AFAIK they're very careful which patches are applied to test beds, and if they affect performance, older drives are retested to account for this. Benchmarks like this are never really applicable outside of the system they're tested in, but the system is designed to provide a level playing field and repeatable results. That's really the best you can hope for. Unless the test bed has a consistent >10% performance deficit to most other systems out there, there's no reason to change it unless it's becoming outdated in other significant areas.
So we are limited by PCI-e interface again. Since the birth of SSD, we pushed past SATA 3Gbps / 6Gbps, than PCI-E 2.0 x4 2GB/S and now PCI-E 3.0, 4GB/s.
When are we going to get PCI-E 4.0, or since 5.0 is only just around the corner may as well wait for it. That is 16GB/s, plenty of room for SSD maker to figure out how to get there.
There's no need to rush there. If you need higher performance, use multiple drives. Maybe on a HEDT or Enterprise platform if you need extreme performance.
But don't be surprised if that won't help your PC as much as you thought. The ultimate limit currently is a RAMdisk. Launch a game from there or install some software - it's still surprisingly slow, because the CPU becomes the bottleneck. And that already applies to modern SSDs, which is obvious in benchmarks which test copying, installing or application launching etc.
Could also be the OS or the RAMdisk driver. When I finished building my 128GB 18-Core system with a FusionIO 2.4 TB leftover and 10Gbit Ethernet, I obviously wanted to bench it on Windows and Linux. I was rather shocked to see how slow things generally remained and how pretty much all these 36 HT-"CPU"s were just yawning.
In the end I never found out, if it was the last free version (3.4.8) version of SoftPerfect's RAM disk that didn' seem to make use of all four memory Xeon E5 memory channels, or some bottleneck in Windows (never seen Windows update user more than a single core), but I never got anywhere near the 70GB/s Johan had me dream of (https://www.anandtech.com/show/8423/intel-xeon-e5-... Don't think I even saturated the 10Gbase-T network, if I recall correctly.
It was quite different in many cases on Linux, but I do remember running an entire Oracle database on tmpfs once, and then an OLTP benchmark on that... again earning myself a totally bored system under the most intensive benchmark hammering I could orchestrate.
There are so many serialization points in all parts of that stack, you never really get the performance you pay for until someone has gone all the way and rewritten the entire software stack from scratch for parallel and in-memory.
Latency is the killer for performance in storage, not bandwidth. You can saturate all bandwidth capacities with HDDs, even tape. Thing is, with dozens (modern CPUs) or thousands (modern GPGPUs) SSDs *become tape*, because of the latencies incurred on non-linear access patterns.
That's why after NVMe, NV-DIMMs or true non-volatile RAM is becoming so important. You might argue that a cache line read from main memory still looks like a tape library change against the register file of an xPU, but it's still way better than PCIe-5-10 with a kernel based block layer abstraction could ever be.
Linear speed and loops are dead: If you cannot unroll, you'll have to crawl.
From what I've read about the implementing 4.0/5.0 on a mobo I'm not convinced we'll see them on consumer boards, at least not in its current form. The maximum PCB trace length without expensive boosters is too short, AIUI 4.0 is marginal to the top PCIe slot/chipset and 5.0 would need signal boosters even to go that far. Estimates I've seen were $50-100 (I think for an x16 slot) to make a 4.0 slot and several times that for 5.0. Cables can apparently go several times longer than PCB traces while maintaining signal quality, but I'm skeptical about them getting snaked around consumer mobos.
And as MrSpadge pointed out in many applications scale out wider is an option, and what I've read that Enterprise Storage is looking at. Instead of x4 slots that have 2/4x the bandwidth of current ones that market is more interested in 5.0 x1 connections that have the same bandwidth as current devices but which would allow them to connect 4 times as many drives. That seems plausible to me since enterprise drive firmware is generally tuned for steady state performance not bursts and most of them don't come as close to saturating buses as high end consumer drives do for shorter/more intermitant workloads.
I guess that's why they are working on silicon photonics: PCB voltage levels, densities, layers, trace lengths... Whereever you look there are walls of physics rising into mountains. If only PCBs weren't so much cheaper than silicon interposers, photonics and other new and rare things!
Any testing under windows on current MacBook Pro hardware? Those SSD's I would've thought are much much faster, but I'd love to see the same test on them.
Thanks for the review. For future, could you consider segregating the drives into different tiers based on results, e.g. video editing, dB, generic OS/boot/app drive, compilation, whatnot.
Now it seems that one drive is better in ine thing, and another drive in anither scenario. But not having your in-depth knowledge, makes it harder to assess which drive would be closest to optimal in which scenario.
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
31 Comments
Back to Article
Spoelie - Thursday, September 6, 2018 - link
2 short questions:- what happened to the plextor M9Pe, performance is hugely different from the review back in march.
- i know this is already the case for a year or so, but what happened to the perf consistency graphs, where can i deduce the same information from?
hyno111 - Thursday, September 6, 2018 - link
M9Pe had firmware updates, not sure if it's applied or related though.DanNeely - Thursday, September 6, 2018 - link
I don't recall the details, but something went wrong with generating the performance consistency data, and they were pulled pending finding a fix due to concerns they were no longer valid. IF you have the patience to dig through the archive, IIRC the situation was explained in the first review without them.Billy Tallis - Thursday, September 6, 2018 - link
I think both of those are a result of me switching to a new version of the test suite at the same time that I applied the Spectre/Meltdown patches and re-tested everything. The Windows and Linux installations were updated, and a few tweaks were made to the synthetic test configuration (such as separating the sequential read results according to whether the test data was written sequentially or randomly). I also applied all the drive firmware updates I could find in the April-May timeframe.The steady-state random write test as it existed a few years ago is gone for good, because it really doesn't say anything relevant about drives that use SLC caching, which is now basically every consumer SSD (except Optane and Samsung MLC drives). I also wasn't too happy with the standard deviation-based consistency metric, because I don't think a drive should be penalized for occasionally being much faster than normal, only much slower than normal.
To judge performance consistency, I prefer to look at the 99th percentile latencies for the ATSB real-world workload traces. Those tend to clearly identify which drives are subject to stuttering performance under load, without exaggerating things as much as an hour-long steady-state torture test.
I may eventually introduce some more QoS measures for the synthetic tests, but at the moment most of them aren't set up to produce meaningful latency statistics. (Testing at a fixed queue depth leads to the coordinated omission problem, potentially drastically understating the severity of things like garbage collection pauses.) At some point I'll also start graphing the performance as a drive is filled, but with the intention of observing things like SLC cache sizes, not for the sake of seeing how the drive behaves when you keep torturing it after it's full.
I will be testing a few consumer SSDs for one of my upcoming enterprise SSD reviews, and that will include steady-state full drive performance for every test.
svan1971 - Thursday, September 6, 2018 - link
I wish current reviews would use current hardware, the 970 Pro replaced the 960 Pro months ago.Billy Tallis - Thursday, September 6, 2018 - link
I've had trouble getting a sample of that one; Samsung's consumer SSD sampling has been very erratic this year. But the 970 Pro is definitely a different class of product from a mainstream TLC-based drive like the XG6. I would only include 970 Pro results here for the same reason that I include Optane results. They're both products for people who don't really care about price at all. There's no sensible reason to be considering a 970 Pro and an XG6-like retail drive as both potential choices for the same purchasing decision.mapesdhs - Thursday, September 6, 2018 - link
Please never stop including older models, the comparisons are always useful. Kinda wish the 950 Pro was in there too.Spunjji - Friday, September 7, 2018 - link
I second this. I know that I am (and feel most other savvy consumers would be) more likely to compare an older high-end product to a newer mid-range product, partly to see if it's worth buying the older gear at a discount and partly to see when there is no performance trade-off in dropping a cost tier.jajig - Friday, September 7, 2018 - link
I third it. I want to know if an upgrade is worth while.dave_the_nerd - Sunday, September 9, 2018 - link
Very much this. And not all of us upgrade our gear every year or two.DanNeely - Friday, September 7, 2018 - link
As long as the tests are the same, you can always pull the comparisons up yourself in Bench.While I sympathize with wanting them in the article tables, 3 or 6 years of historical low/mid/high end SSDs would end up either eating a lot of the tables reducing the number of current drives listed or making them much longer, so I fully understand why very little of that data is in the main tables.
wumpus - Thursday, September 6, 2018 - link
DRAM buffer isn't mentioned but board has 4 chips on it, two are obviously flash chips, one is the Toshiba controller and one is by Nanya, a DRAM manufacturer. The kicker is that as an OEM part, the final customer has no way of telling if that chip is populated before purchase (and the lack of specs make it easier to leave it off).Hopefully if these make it to the open market we can at least tell if they have the DRAM or not. Note that some of the cheaper NVMes (think ADATA XPG 6000) seem to do fine without DRAM, but they are priced to compete with SATA, not other NVMes.
Billy Tallis - Thursday, September 6, 2018 - link
No XG6-based OEM drive is going to be DRAMless. Toshiba has the BG series for that purpose, with an entirely different controller.wumpus - Thursday, September 6, 2018 - link
Really? Then who took that photo? Is the board in the photo the board that you reviewed? That board clearly has this chip on it:http://www.nanya.com/en/Product/4228/NT6CL128M32CM...
That's a 4Gb (512MB) LPDDR3 DRAM chip. Don't tell me that the board in the photograph doesn't have DRAM. They might not ship DRAM with the OEM devices, but that doesn't mean they didn't give you a SSD with DRAM to review.
MrSpadge - Thursday, September 6, 2018 - link
He did not say"No, XG6-based OEM drive is going to be DRAMless",
just
"No XG6-based OEM drive is going to be DRAMless."
i.e. none of these drives will be DRAMless.
wumpus - Thursday, September 6, 2018 - link
My eyes are going. Should I go get a monitor with less dot pitch or get a mac where it doesn't force dot pitch to the monitor size? Decisions, decisions.Commas are just to small for modern monitors. I was planing on getting higher dot pitch, but now I'm wondering.
Valantar - Friday, September 7, 2018 - link
I doubt a different monitor would help if your eyes are inserting punctuation where there is none - missing it when it's there is another matter. Besides, the fact that the sentence with an inserted comma doesn't add up grammatically should have tipped you off.Walkeer - Thursday, September 6, 2018 - link
testing ssd performance on intel plaform is like testing race slicks tires on a child paddle car. Intel I/O performance went down by tens of percents with all the meltdown/spectre mitigations. please use AMD plaform insteadMrSpadge - Thursday, September 6, 2018 - link
Usually they keep testing environments consistent for 1-2 years exactly due to such changing software conditions. It could well be that the next test suite will feature AMD CPUs and, as always, yield results not strictly comparable to the older ones.29a - Thursday, September 6, 2018 - link
If that was the case they wouldn't use the spectre/md patches.Valantar - Friday, September 7, 2018 - link
AFAIK they're very careful which patches are applied to test beds, and if they affect performance, older drives are retested to account for this. Benchmarks like this are never really applicable outside of the system they're tested in, but the system is designed to provide a level playing field and repeatable results. That's really the best you can hope for. Unless the test bed has a consistent >10% performance deficit to most other systems out there, there's no reason to change it unless it's becoming outdated in other significant areas.iwod - Thursday, September 6, 2018 - link
So we are limited by PCI-e interface again. Since the birth of SSD, we pushed past SATA 3Gbps / 6Gbps, than PCI-E 2.0 x4 2GB/S and now PCI-E 3.0, 4GB/s.When are we going to get PCI-E 4.0, or since 5.0 is only just around the corner may as well wait for it. That is 16GB/s, plenty of room for SSD maker to figure out how to get there.
MrSpadge - Thursday, September 6, 2018 - link
There's no need to rush there. If you need higher performance, use multiple drives. Maybe on a HEDT or Enterprise platform if you need extreme performance.But don't be surprised if that won't help your PC as much as you thought. The ultimate limit currently is a RAMdisk. Launch a game from there or install some software - it's still surprisingly slow, because the CPU becomes the bottleneck. And that already applies to modern SSDs, which is obvious in benchmarks which test copying, installing or application launching etc.
abufrejoval - Friday, September 7, 2018 - link
Could also be the OS or the RAMdisk driver. When I finished building my 128GB 18-Core system with a FusionIO 2.4 TB leftover and 10Gbit Ethernet, I obviously wanted to bench it on Windows and Linux. I was rather shocked to see how slow things generally remained and how pretty much all these 36 HT-"CPU"s were just yawning.In the end I never found out, if it was the last free version (3.4.8) version of SoftPerfect's RAM disk that didn' seem to make use of all four memory Xeon E5 memory channels, or some bottleneck in Windows (never seen Windows update user more than a single core), but I never got anywhere near the 70GB/s Johan had me dream of (https://www.anandtech.com/show/8423/intel-xeon-e5-... Don't think I even saturated the 10Gbase-T network, if I recall correctly.
It was quite different in many cases on Linux, but I do remember running an entire Oracle database on tmpfs once, and then an OLTP benchmark on that... again earning myself a totally bored system under the most intensive benchmark hammering I could orchestrate.
There are so many serialization points in all parts of that stack, you never really get the performance you pay for until someone has gone all the way and rewritten the entire software stack from scratch for parallel and in-memory.
Latency is the killer for performance in storage, not bandwidth. You can saturate all bandwidth capacities with HDDs, even tape. Thing is, with dozens (modern CPUs) or thousands (modern GPGPUs) SSDs *become tape*, because of the latencies incurred on non-linear access patterns.
That's why after NVMe, NV-DIMMs or true non-volatile RAM is becoming so important. You might argue that a cache line read from main memory still looks like a tape library change against the register file of an xPU, but it's still way better than PCIe-5-10 with a kernel based block layer abstraction could ever be.
Linear speed and loops are dead: If you cannot unroll, you'll have to crawl.
halcyon - Monday, September 10, 2018 - link
Thank you for writing this.Quantum Mechanix - Monday, September 10, 2018 - link
Awesome write up- my favorite kind of comment, where I walk away just a *tiny* less ignorant. Thank you! :)DanNeely - Thursday, September 6, 2018 - link
We've been 3.0 x4 bottlenecked for a few years.From what I've read about the implementing 4.0/5.0 on a mobo I'm not convinced we'll see them on consumer boards, at least not in its current form. The maximum PCB trace length without expensive boosters is too short, AIUI 4.0 is marginal to the top PCIe slot/chipset and 5.0 would need signal boosters even to go that far. Estimates I've seen were $50-100 (I think for an x16 slot) to make a 4.0 slot and several times that for 5.0. Cables can apparently go several times longer than PCB traces while maintaining signal quality, but I'm skeptical about them getting snaked around consumer mobos.
And as MrSpadge pointed out in many applications scale out wider is an option, and what I've read that Enterprise Storage is looking at. Instead of x4 slots that have 2/4x the bandwidth of current ones that market is more interested in 5.0 x1 connections that have the same bandwidth as current devices but which would allow them to connect 4 times as many drives. That seems plausible to me since enterprise drive firmware is generally tuned for steady state performance not bursts and most of them don't come as close to saturating buses as high end consumer drives do for shorter/more intermitant workloads.
abufrejoval - Friday, September 7, 2018 - link
I guess that's why they are working on silicon photonics: PCB voltage levels, densities, layers, trace lengths... Whereever you look there are walls of physics rising into mountains. If only PCBs weren't so much cheaper than silicon interposers, photonics and other new and rare things!darwiniandude - Sunday, September 9, 2018 - link
Any testing under windows on current MacBook Pro hardware? Those SSD's I would've thought are much much faster, but I'd love to see the same test on them.halcyon - Monday, September 10, 2018 - link
Thanks for the review. For future, could you consider segregating the drives into different tiers based on results, e.g. video editing, dB, generic OS/boot/app drive, compilation, whatnot.Now it seems that one drive is better in ine thing, and another drive in anither scenario. But not having your in-depth knowledge, makes it harder to assess which drive would be closest to optimal in which scenario.
halcyon - Monday, September 10, 2018 - link
Sorry for the typos, mobile posting on the fly... Wish there was at least a 1min edit/fix window for new posts...