If anyone is interested in reading more speculation on Prescott and how it gets 64-bits, I posted some of my *theories* over at the FiringSquad forums. Here's the link for the complete discussion:
The important part is as follows: ---------------------------------- The big question now is, how well Prescott-64 perform? I think that they can get the heat under control. (More speculation.) However, maximizing 64-bit performance might be a bit more difficult. Look at AMD with the stuff they've licensed from Intel. Intel still beats them in MMX, SSE, and now SSE2 performance (although they are getting closer with each new processor release).
Some other interesting things about the news: Intel is going to clock the ALUs (Arithmetic Logic Units) at core speed when running 64-bit code, apparently. Actually, they say 7 GHz in 32-bit mode and 4 GHz in 64-bit mode. That's a little odd, since the current ALUs run at twice the core speed in 32-bit mode, so 7 GHz would be from a 3.5 GHz processor. Why they would run at 4 GHz and not 3.5 GHz I couldn't say. Maybe because they can?
How will that affect performance? It depends on how the 64-bit extensions were added. If they use the same setup as the regular P4 core, with the only difference being that they added registers and made them 64-bits wide, then it would likely hurt performance relative to 32-bit mode. However, it is *possible* that the 64-bit was added on as a completely separate module. If this is the case, they might have separate 64-bit ALUs/AGUs. In other words, the current NetBurst design has 7 functional units: Two simple ALUs that run at 2X core speed, one complex ALU running at core speed, an FPU/SSE Move/Store, a full FPU/SSE that handles all of those operations, and two AGUs (Address Generation Units). The 64-bit extensions in Prescott/Nocona/Potomac (called "Clackamas Technology") could have their own AGUs and ALUs.
That would make sense to me, since as I mentioned in my earlier speculation, the core is currently about 73 million transistors compared to the Northwood's 29 million. Northwood has 7 functional units with 20 pipeline stages, giving about 205,000 transistors per stage per unit (29 million / (20 * 7)). If the Prescott design simply extended the NetBurst architecture to 64-bits and 31 stages, it would be around 335,000 transistors per stage per unit (72 million / (31 * 7)). On the other hand, if the 64-bit extensions are added in a separate module with their own AGUs and ALUs, the Prescott would now have 11 functional units. That would give 214,000 transistors per stage per unit (73 million / (31 * 11)). An even more radical approach might be to have three 64-bit AGUs and ALUs. Then you would only have about 178,000 transistors per stage per unit. (That's a little more hard to believe, but since Intel is being forced to adopt AMD's instruction set, they might want to adopt the architecture for performance reasons.) Note: These are all very rough estimates. FPUs generally have more pipeline stages, and there are lots of other factors to consider, like the L1 cache and trace cache. This is just a baseline estimate.
As I stated earlier, increasing the number of transistors by such a large amount without adding more functional units would make the Prescott design scale worse than the Northwood design. Why would Intel do that!? Going to 31 stages would have been done to decrease the average number of transistors per stage, and they would likely aim to be at worst about the same as the Northwood. I certainly don't know for sure what was done, but various rumors and the fact that the 32-bit and 64-bit ALUs run at different speeds make me wonder. I suppose we'll know more in about two or three months, if not before then.
PC3200 is 3.2 GB/s single channel, and dual-channel it is 6.4 GB/s. XDR single-channel is 6.4 GB/s, so in a dual-channel setup (which is very likely, since almost all Rambus implementations in the past were dual-channel other than i820 - and we all know what a fiasco that was!) XDR will be 12.8 GB/s.
It is important to note that DDR is normally a 64-bit bus, where RDRAM/XDR are apparently a 16-bit bus. Running 64 traces over a motherboard at high clock speeds is difficult at best, but if you cut that to 16 traces, it is not as hard. That's what Rambus was all about initially. Now, DDR is running 200 MHz (400 effective) with 128 traces in dual-channel operation. XDR is countering by running 16 traces at 400 MHz (3200 effective).
I find it interesting that the clock speed of XDR is really 400 MHz externally, but then internally they send eight bits per clock. From what was said in the article, I guess they first multiply the clock by four, and then they more or less use DDR tactics where you send a bit on the rising and falling clock. The end result, though, is the same. DDR2 does something similar, I believe. "1 GHz" DDR2 is really running at 250 MHz, with four bits per clock. So they double the clock and then send data on the rising and falling clock signal.
In order to match XDR, DDR2 would have to run at 200 MHz and an effective 800 MHz. We're seeing that on graphics cards, but it looks like that is still a ways off for motherboards. The latency question is still not really being answered by Rambus. "Low latency" at 3.2 GHz effective speeds could mean anything. I have seen that DDR2 is only offering CAS Latencies of 3, 4, and 5. I wonder what the equivalent XDR latency is - probably something like 6, 8, and 10.
If/when retail boards are released using XDR, it could be an exciting matchup. Prescott at 4 GHz could make very good use of added memory bandwidth, I bet. Integrated graphics with a 12.8 GB/s memory subsystem might actually not suck that hard! :)
#3, he said dual channel. Single channel IS 3.2GB/s, but dual channel is 6.4. I was going to point out that DC-DDR was the same speed as XDR.
(my own comment) Remember when Northwood came out, and it didn't have HyperThreading enabled, but later released enabled it? Well, I wouldn't be surprised if the P4-F or P4-G is a Prescott-64.
I predicted by reading Anand's articles months ago that the 64bit feature would be a lot like the Hyper-threading not enabled in the Williamette cores. I'm expecting something similar to happen this time around. I think Intel's whole timing thing to an extent is true, but had the 64bit helped or not decreased performance by a lot, they might have released it enabled.
Even if you do enable 64 bit functionality in Prescott, wont you need a mobo or at least a BIOS upgrade to handle it. You probably dont need the memory size extension on the address bus but I dont know the size of the data bus on the prescott. If it is only 32 bit wide then it would need to carry out two fetches for full 64bit functionality (plus internal 64bit manipulation) but this would require a change to the microcode in the BIOS. Unless this is already present in prescott BIOS upgrades for mobos(i875/865) then you may have difficulties even if you 'switch on' the X86-64 commands. I suspect they are not going to enable it til Sckt 755.
One thing I wonder about is how flexible x86-64 is. Could it go through a revision that drops support for 32-bit instructions to enhance 64-bit performance when 64-bit software is the only software you can buy?
"ultra emulated x86 with 8-way-hyperthreading and a +5 Dynamic Compiler of Doom"
Sweet! Where can I get one? Is it compatible with my DRAM skin armor? Someones been playing too much Baldur's Gate, and not just me. (Think of all the processors Intel could sell with marketing like that.)
Actually, PCI-X is completely differet from PCI Express ... PCI-X is a parallel architecture that's wider and faster than the 32bit 33mhz pci bus ... PCI Express is specification for a point to point serial bus protocol (and multiple serial data streams can be sent to the same periphreial, thus the x16 pci express graphics card).
Any when I was talking about ATI's "next gen" chip I wasn't talking about their current PCI Express solution RV380. I was talking about some unspecified demo that I'm going to assume was R420 or R423... I just didn't want to mention a card since ATI wouldn't tell me which card it was that was powering the box.
I think I fixed all the typos, sorry bout that ... I've been working by jumping between hotspots and hand coding html rather than using the dreamweaver over broadband that I'm used to ;-)
whoops forgot a few things, guess I "jumped the gun[post]"... 1st, this isn't KillaKilla, hes my brother, I don't have my own nick yet, sorry... What did you mean by "2x to 3x performance gains" for native PCI-X (pci express is PCIX, right? I've seen it as PCI-E, but that was from before?) Also, what are these "HD streams"(2nd to last paragraph, 2nd page) you talk about?
3rd page:
"The upcoming XDR chips were on display up at the RAMBUS both across from a demo of Toshiba chips running at very high speeds (the bandwidth of XDR is 6.4GB/s)." Isn't DC-DDR 3200/400's bandwidth 6.4GB/s?
4th page:
I'm not surprised that intel cross-licenced x86-64... it was only logical seeing MS-XP64. Kudos to AMD for making a better 64-bit solution(extension set).
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
17 Comments
Back to Article
TrogdorJW - Thursday, February 19, 2004 - link
If anyone is interested in reading more speculation on Prescott and how it gets 64-bits, I posted some of my *theories* over at the FiringSquad forums. Here's the link for the complete discussion:http://forums.firingsquad.com/firingsquad/board/me...
The important part is as follows:
----------------------------------
The big question now is, how well Prescott-64 perform? I think that they can get the heat under control. (More speculation.) However, maximizing 64-bit performance might be a bit more difficult. Look at AMD with the stuff they've licensed from Intel. Intel still beats them in MMX, SSE, and now SSE2 performance (although they are getting closer with each new processor release).
Some other interesting things about the news: Intel is going to clock the ALUs (Arithmetic Logic Units) at core speed when running 64-bit code, apparently. Actually, they say 7 GHz in 32-bit mode and 4 GHz in 64-bit mode. That's a little odd, since the current ALUs run at twice the core speed in 32-bit mode, so 7 GHz would be from a 3.5 GHz processor. Why they would run at 4 GHz and not 3.5 GHz I couldn't say. Maybe because they can?
How will that affect performance? It depends on how the 64-bit extensions were added. If they use the same setup as the regular P4 core, with the only difference being that they added registers and made them 64-bits wide, then it would likely hurt performance relative to 32-bit mode. However, it is *possible* that the 64-bit was added on as a completely separate module. If this is the case, they might have separate 64-bit ALUs/AGUs. In other words, the current NetBurst design has 7 functional units: Two simple ALUs that run at 2X core speed, one complex ALU running at core speed, an FPU/SSE Move/Store, a full FPU/SSE that handles all of those operations, and two AGUs (Address Generation Units). The 64-bit extensions in Prescott/Nocona/Potomac (called "Clackamas Technology") could have their own AGUs and ALUs.
That would make sense to me, since as I mentioned in my earlier speculation, the core is currently about 73 million transistors compared to the Northwood's 29 million. Northwood has 7 functional units with 20 pipeline stages, giving about 205,000 transistors per stage per unit (29 million / (20 * 7)). If the Prescott design simply extended the NetBurst architecture to 64-bits and 31 stages, it would be around 335,000 transistors per stage per unit (72 million / (31 * 7)). On the other hand, if the 64-bit extensions are added in a separate module with their own AGUs and ALUs, the Prescott would now have 11 functional units. That would give 214,000 transistors per stage per unit (73 million / (31 * 11)). An even more radical approach might be to have three 64-bit AGUs and ALUs. Then you would only have about 178,000 transistors per stage per unit. (That's a little more hard to believe, but since Intel is being forced to adopt AMD's instruction set, they might want to adopt the architecture for performance reasons.) Note: These are all very rough estimates. FPUs generally have more pipeline stages, and there are lots of other factors to consider, like the L1 cache and trace cache. This is just a baseline estimate.
As I stated earlier, increasing the number of transistors by such a large amount without adding more functional units would make the Prescott design scale worse than the Northwood design. Why would Intel do that!? Going to 31 stages would have been done to decrease the average number of transistors per stage, and they would likely aim to be at worst about the same as the Northwood. I certainly don't know for sure what was done, but various rumors and the fact that the 32-bit and 64-bit ALUs run at different speeds make me wonder. I suppose we'll know more in about two or three months, if not before then.
TrogdorJW - Thursday, February 19, 2004 - link
PC3200 is 3.2 GB/s single channel, and dual-channel it is 6.4 GB/s. XDR single-channel is 6.4 GB/s, so in a dual-channel setup (which is very likely, since almost all Rambus implementations in the past were dual-channel other than i820 - and we all know what a fiasco that was!) XDR will be 12.8 GB/s.It is important to note that DDR is normally a 64-bit bus, where RDRAM/XDR are apparently a 16-bit bus. Running 64 traces over a motherboard at high clock speeds is difficult at best, but if you cut that to 16 traces, it is not as hard. That's what Rambus was all about initially. Now, DDR is running 200 MHz (400 effective) with 128 traces in dual-channel operation. XDR is countering by running 16 traces at 400 MHz (3200 effective).
I find it interesting that the clock speed of XDR is really 400 MHz externally, but then internally they send eight bits per clock. From what was said in the article, I guess they first multiply the clock by four, and then they more or less use DDR tactics where you send a bit on the rising and falling clock. The end result, though, is the same. DDR2 does something similar, I believe. "1 GHz" DDR2 is really running at 250 MHz, with four bits per clock. So they double the clock and then send data on the rising and falling clock signal.
In order to match XDR, DDR2 would have to run at 200 MHz and an effective 800 MHz. We're seeing that on graphics cards, but it looks like that is still a ways off for motherboards. The latency question is still not really being answered by Rambus. "Low latency" at 3.2 GHz effective speeds could mean anything. I have seen that DDR2 is only offering CAS Latencies of 3, 4, and 5. I wonder what the equivalent XDR latency is - probably something like 6, 8, and 10.
If/when retail boards are released using XDR, it could be an exciting matchup. Prescott at 4 GHz could make very good use of added memory bandwidth, I bet. Integrated graphics with a 12.8 GB/s memory subsystem might actually not suck that hard! :)
Malladine - Thursday, February 19, 2004 - link
Oops...actually wanted to ask #13 if that means that PC3200 is 1.6gb/s single channel?Malladine - Thursday, February 19, 2004 - link
bhtooefr - Thursday, February 19, 2004 - link
#3, he said dual channel. Single channel IS 3.2GB/s, but dual channel is 6.4. I was going to point out that DC-DDR was the same speed as XDR.(my own comment) Remember when Northwood came out, and it didn't have HyperThreading enabled, but later released enabled it? Well, I wouldn't be surprised if the P4-F or P4-G is a Prescott-64.
KalTorak - Thursday, February 19, 2004 - link
The processor data bus has been 64 bits wide since the original Pentium processor, as I recall.AgaBooga - Thursday, February 19, 2004 - link
I predicted by reading Anand's articles months ago that the 64bit feature would be a lot like the Hyper-threading not enabled in the Williamette cores. I'm expecting something similar to happen this time around. I think Intel's whole timing thing to an extent is true, but had the 64bit helped or not decreased performance by a lot, they might have released it enabled.Pumpkinierre - Wednesday, February 18, 2004 - link
Even if you do enable 64 bit functionality in Prescott, wont you need a mobo or at least a BIOS upgrade to handle it. You probably dont need the memory size extension on the address bus but I dont know the size of the data bus on the prescott. If it is only 32 bit wide then it would need to carry out two fetches for full 64bit functionality (plus internal 64bit manipulation) but this would require a change to the microcode in the BIOS. Unless this is already present in prescott BIOS upgrades for mobos(i875/865) then you may have difficulties even if you 'switch on' the X86-64 commands. I suspect they are not going to enable it til Sckt 755.Jeff7181 - Wednesday, February 18, 2004 - link
One thing I wonder about is how flexible x86-64 is. Could it go through a revision that drops support for 32-bit instructions to enhance 64-bit performance when 64-bit software is the only software you can buy?KalTorak - Wednesday, February 18, 2004 - link
And given the iffy bandwidth available at Moscone West, I think Derek's doing pretty well to get these reports in :)The XDR stuff _was_ pretty cool; I didn't realize there was a clock source with low enough jitter to make that thing work.
Mrburns2007 - Wednesday, February 18, 2004 - link
XDR has 6.4 GB/s per chip not module.Ecmaster76 - Wednesday, February 18, 2004 - link
"ultra emulated x86 with 8-way-hyperthreading and a +5 Dynamic Compiler of Doom"Sweet! Where can I get one? Is it compatible with my DRAM skin armor?
Someones been playing too much Baldur's Gate, and not just me.
(Think of all the processors Intel could sell with marketing like that.)
DerekWilson - Wednesday, February 18, 2004 - link
Actually, PCI-X is completely differet from PCI Express ... PCI-X is a parallel architecture that's wider and faster than the 32bit 33mhz pci bus ... PCI Express is specification for a point to point serial bus protocol (and multiple serial data streams can be sent to the same periphreial, thus the x16 pci express graphics card).Any when I was talking about ATI's "next gen" chip I wasn't talking about their current PCI Express solution RV380. I was talking about some unspecified demo that I'm going to assume was R420 or R423... I just didn't want to mention a card since ATI wouldn't tell me which card it was that was powering the box.
I think I fixed all the typos, sorry bout that ... I've been working by jumping between hotspots and hand coding html rather than using the dreamweaver over broadband that I'm used to ;-)
Lonyo - Wednesday, February 18, 2004 - link
Intel was pretty much always going to use compatible 64bit extensions.They have to work with the OS, since MS is pretty much dictating that.
AMD set up the initial spec (I would assume), and Intel didn't have much choice but to follow.
ATi and nVidia have to stick to the PCI-Express spec to make their next gen graphics cards, and that was designed by Intel, it's just a similar thing.
AMD obviously did well to get there first though and set the standards.
Malladine - Wednesday, February 18, 2004 - link
KillaKilla's older brother: PC3200 Bandwidth is 3.2gb/s :)http://www.kingston.com/newtech/ddrbandwidth.asp
KillaKilla - Wednesday, February 18, 2004 - link
whoops forgot a few things, guess I "jumped the gun[post]"...1st, this isn't KillaKilla, hes my brother, I don't have my own nick yet, sorry...
What did you mean by "2x to 3x performance gains" for native PCI-X (pci express is PCIX, right? I've seen it as PCI-E, but that was from before?) Also, what are these "HD streams"(2nd to last paragraph, 2nd page) you talk about?
3rd page:
"The upcoming XDR chips were on display up at the RAMBUS both across from a demo of Toshiba chips running at very high speeds (the bandwidth of XDR is 6.4GB/s)." Isn't DC-DDR 3200/400's bandwidth 6.4GB/s?
4th page:
I'm not surprised that intel cross-licenced x86-64... it was only logical seeing MS-XP64. Kudos to AMD for making a better 64-bit solution(extension set).
-KillaKilla's older brother
KillaKilla - Wednesday, February 18, 2004 - link
Once again, first post.Anyway, there are a few typos.
the Borad in the title?
The open tags on 2nd page