For reference, IBM just released their octal chip Power7 3.8Ghz result for the SAP 2 tier benchmark. The result is 202180 saps for approx 2.32x faster than the Octal chipNehalem-EX
The article cover on the front page mentions 1 TB maximum on the R810 and then 512 GB on page one. The R910 is the 1TB version, the R810 is "only" 512GB. You can also do a single processor in the R810. Though why you would drop the cash on an R810 and a single proc I don't know.
I'm also curious how good it would be at gaming :) I know in many cases these server setups under-perform high end gaming machines, but I'd settle :) Still, something like this would be nice for my side business.
None of the Nehalem-EX numbers are accurate, because Nehalem-EX kernel optimization isn't in Windows 2008 Enterprise. There are only 3 commercial OSes right now that have Nehalem-EX optimization: Windows Server R2 with SQL Server 2008 R2, RHEL 5.5, SLES 11, and soon to be released CentOS 5.5 based on RHEL 5.5. Windows 2008 R1 has trouble scaling to 64 threads, and SQL Server 2008 R1 absolutely hates Nehalem-EX. You are cutting Nehalem-EX benchmarks short by 20% or so by using Windows 2008 R1.
The problem isn't as severe for Magny cours, because the OS sees 4 or 8 sockets of 6 cores each via the enumerator, thus treats it with the same optimization as an 8 socket 8400 series CPU.
It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps. Can you remove half of the memory dimms to make sure that it isn't Dell's flex memory technology that's slowing things down intentionally to push sales toward R910?
As far as I know, when you only populate two sockets on the R810, the Dell R810 flex memory technology routes the 16 dimms that used to be connected to the 2 empty sockets over to the 2 center CPUs, there could be significant memory bandwidth penalties induced by that.
"This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers."
Sorry, should have read a little slower. Damn, Dell cut half the memory channels from the R810!!!! That's a retarded design, no wonder the memory bandwidth is so low!!!!!
"Damn, Dell cut half the memory channels from the R810!"
You read too fast again :-). Only in Quad CPU config. In dual CPU config, you get 4 memory controllers, which connect each two SMBs. So in a dual Config, you get the same bandwidth as you would in another server.
The R810 targets those that are not after the highest CPU processing power, but want the RAS features and 32 DIMM slots. AFAIK,
2 channels of DDR3-1066 per socket in a fully populated R810 and if you populate 2 sockets, you get the flex memory routing penalty...damn..............!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! R810 sucks.
"It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps."
On one side you have a parallel half duplex DDR-3 DIMM. On the other side of the SMB you have a serial full duplex SMI. The buffers might not perform this transition fast enough, and there has to be some overhead. I also am still searching for the clockspeed of the IMC. The SMIs are on a different (I/O) clockdomain than the L3-cache.
We will test with Intel's / QSSC quad CPU to see whether the flexmem bridge has any influence. But I don't think it will do much. You might add a bit of latency, but essentially the R810 is working like a dual CPU with four IMCs just like another (Dual CPU) Nehalem EX server system would.
Thanks for the useful info. R810 then doesn't meet my standard.
Johan, is there anyway you can get your hands on a R910 4 Processor system from Dell and bench the memory bandwidth to see how much that flex mem chip costs in terms of bandwidth?
Great article! I like the way in witch you describe the memory subsystem - I have readed the Intel datasheets and many news articles about Xeon 7500, but your description is the best so far.
You say "So each CPU has two memory interfaces that connect to two SMBs that can each drive two channels with two DIMMS. Thus, each CPU supports eight registered DDR3 DIMMs ...", but if I do the math it seems: 2 SMIs x 2 SMBs x 2 channels x 2 DIMMs = 16 DDR3 DIMMs, not 8 as written in the second sentence. Later in the article I think you mention 16 at different places, so it seems it is realy 16 and not 8.
What about Itanium 9300 review (including general background on the plans of OEMs/Intel for IA-64 platform)? Comparision of scalability(HT/QPI)/memory/RAS features of Xeon 7500, Itanium 9300 and Opteron 6000 would be welcome. Also I would like to see a performance comparision with appropriate applications for the RISC mainframe market (HPC?) with 4- and 8-socket AMD, Intel Xeon, Intel Itanium, POWER7, newest SPARC.
I agree. Being able to use all the DIMM slots in the R810 with only half the CPU sockets populated is a neat trick, and I do like having up to 16 drive bays in the R910, but overall the latest IBM 3850 is much more flexible than either of those systems. From a 2 socket 4 cores each system with 32GB RAM up to an 8 socket 8 cores each system with 3TB RAM. Barring some big surprises at HPs announcement in a couple of weeks IBM will be the one to beat in Nehalem EX for the foreseeable future.
"but when a dual-CPU configuration outperforms quad-CPU configurations of your top-of-the-line CPU, something is wrong. " Remember Xeon 7100 vs Xeon 5300?
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
23 Comments
Back to Article
dastruch - Monday, April 12, 2010 - link
Thanks AnandTech! I've been waiting for an year for this very moment and if only those 25nm Lyndonville SSDs were here too.. :)thunng8 - Monday, April 12, 2010 - link
For reference, IBM just released their octal chip Power7 3.8Ghz result for the SAP 2 tier benchmark. The result is 202180 saps for approx 2.32x faster than the Octal chipNehalem-EXJammrock - Monday, April 12, 2010 - link
The article cover on the front page mentions 1 TB maximum on the R810 and then 512 GB on page one. The R910 is the 1TB version, the R810 is "only" 512GB. You can also do a single processor in the R810. Though why you would drop the cash on an R810 and a single proc I don't know.vol7ron - Tuesday, April 13, 2010 - link
I wish I could afford something like this!I'm also curious how good it would be at gaming :) I know in many cases these server setups under-perform high end gaming machines, but I'd settle :) Still, something like this would be nice for my side business.
whatever1951 - Tuesday, April 13, 2010 - link
None of the Nehalem-EX numbers are accurate, because Nehalem-EX kernel optimization isn't in Windows 2008 Enterprise. There are only 3 commercial OSes right now that have Nehalem-EX optimization: Windows Server R2 with SQL Server 2008 R2, RHEL 5.5, SLES 11, and soon to be released CentOS 5.5 based on RHEL 5.5. Windows 2008 R1 has trouble scaling to 64 threads, and SQL Server 2008 R1 absolutely hates Nehalem-EX. You are cutting Nehalem-EX benchmarks short by 20% or so by using Windows 2008 R1.The problem isn't as severe for Magny cours, because the OS sees 4 or 8 sockets of 6 cores each via the enumerator, thus treats it with the same optimization as an 8 socket 8400 series CPU.
So, please rerun all the benchmarks.
JohanAnandtech - Tuesday, April 13, 2010 - link
It is a small mistake in our table. We have been using R2 for months now. We do use Windows 2008 R2 Enterprise.whatever1951 - Tuesday, April 13, 2010 - link
Ok. Change the table to reflect Windows Server 2008 R2 and SQL Server 2008 R2 information please.Any explanation for such poor memory bandwidth? Damn, those SMBs must really slow things down or there must be a software error.
whatever1951 - Tuesday, April 13, 2010 - link
It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps. Can you remove half of the memory dimms to make sure that it isn't Dell's flex memory technology that's slowing things down intentionally to push sales toward R910?whatever1951 - Tuesday, April 13, 2010 - link
As far as I know, when you only populate two sockets on the R810, the Dell R810 flex memory technology routes the 16 dimms that used to be connected to the 2 empty sockets over to the 2 center CPUs, there could be significant memory bandwidth penalties induced by that.whatever1951 - Tuesday, April 13, 2010 - link
"This should add a little bit of latency, but more importantly it means that in a four-CPU configuration, the R810 uses only one memory controller per CPU. The same is true for the M910, the blade server version. The result is that the quad-CPU configuration has only half the bandwidth of a server like the Dell R910 which gives each CPU two memory controllers."Sorry, should have read a little slower. Damn, Dell cut half the memory channels from the R810!!!! That's a retarded design, no wonder the memory bandwidth is so low!!!!!
JohanAnandtech - Tuesday, April 13, 2010 - link
"Damn, Dell cut half the memory channels from the R810!"You read too fast again :-). Only in Quad CPU config. In dual CPU config, you get 4 memory controllers, which connect each two SMBs. So in a dual Config, you get the same bandwidth as you would in another server.
The R810 targets those that are not after the highest CPU processing power, but want the RAS features and 32 DIMM slots. AFAIK,
whatever1951 - Tuesday, April 13, 2010 - link
2 channels of DDR3-1066 per socket in a fully populated R810 and if you populate 2 sockets, you get the flex memory routing penalty...damn..............!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! R810 sucks.Sindarin - Tuesday, April 13, 2010 - link
whatever1951 you lost me @ Hello.........................and I thought Sauron was tough!! lolJohanAnandtech - Tuesday, April 13, 2010 - link
"It is hard to imagine 4 channels of DDR3-1066 to be 1/3 slower than even the westmere-eps."On one side you have a parallel half duplex DDR-3 DIMM. On the other side of the SMB you have a serial full duplex SMI. The buffers might not perform this transition fast enough, and there has to be some overhead. I also am still searching for the clockspeed of the IMC. The SMIs are on a different (I/O) clockdomain than the L3-cache.
We will test with Intel's / QSSC quad CPU to see whether the flexmem bridge has any influence. But I don't think it will do much. You might add a bit of latency, but essentially the R810 is working like a dual CPU with four IMCs just like another (Dual CPU) Nehalem EX server system would.
whatever1951 - Tuesday, April 13, 2010 - link
Thanks for the useful info. R810 then doesn't meet my standard.Johan, is there anyway you can get your hands on a R910 4 Processor system from Dell and bench the memory bandwidth to see how much that flex mem chip costs in terms of bandwidth?
IntelUser2000 - Tuesday, April 13, 2010 - link
The Uncore of the X7560 runs at 2.4GHz.JohanAnandtech - Wednesday, April 14, 2010 - link
Do you have a source for that? Must have missed it.Etern205 - Thursday, April 15, 2010 - link
I think AT needs to fix this "RE:RE:RE...:" problem?amalinov - Wednesday, April 14, 2010 - link
Great article! I like the way in witch you describe the memory subsystem - I have readed the Intel datasheets and many news articles about Xeon 7500, but your description is the best so far.You say "So each CPU has two memory interfaces that connect to two SMBs that can each drive two channels with two DIMMS. Thus, each CPU supports eight registered DDR3 DIMMs ...", but if I do the math it seems: 2 SMIs x 2 SMBs x 2 channels x 2 DIMMs = 16 DDR3 DIMMs, not 8 as written in the second sentence. Later in the article I think you mention 16 at different places, so it seems it is realy 16 and not 8.
What about Itanium 9300 review (including general background on the plans of OEMs/Intel for IA-64 platform)? Comparision of scalability(HT/QPI)/memory/RAS features of Xeon 7500, Itanium 9300 and Opteron 6000 would be welcome. Also I would like to see a performance comparision with appropriate applications for the RISC mainframe market (HPC?) with 4- and 8-socket AMD, Intel Xeon, Intel Itanium, POWER7, newest SPARC.
jeha - Thursday, April 15, 2010 - link
You really should review the IBM 3850 X5 I think?They have some interesting solutions when it comes to handling memory expansions etc.
klstay - Thursday, April 15, 2010 - link
I agree. Being able to use all the DIMM slots in the R810 with only half the CPU sockets populated is a neat trick, and I do like having up to 16 drive bays in the R910, but overall the latest IBM 3850 is much more flexible than either of those systems. From a 2 socket 4 cores each system with 32GB RAM up to an 8 socket 8 cores each system with 3TB RAM. Barring some big surprises at HPs announcement in a couple of weeks IBM will be the one to beat in Nehalem EX for the foreseeable future.Etern205 - Thursday, April 15, 2010 - link
The AMD Opteron 6128 isn't $523.It's $299.99!
http://www.newegg.com/Product/Product.aspx?Item=N8...
(credited to: zpdixon @ DT for providing the link)
yuhong - Tuesday, June 15, 2010 - link
"but when a dual-CPU configuration outperforms quad-CPU configurations of your top-of-the-line CPU, something is wrong. "Remember Xeon 7100 vs Xeon 5300?