Today, Intel took the embargo off its latest Core architecture revision, Nehalem (pronounced Nuh hay lem). The reviews are in and Core i7 is a force to be reckoned with. Perhaps Bert Toepelt of Tom's Hardware sums it up best, "Put bluntly, you'd need two and a half Phenom X4 processors to compete with Intel's current Core i7 flagship model." With hyperthreading's eight virtual cores, a 25% increase clock-per-clock over Core 2, a new on-die memory controller and lower power consumption, does it have a downside? There are no pin compatible models to Core 2, meaning it requires a new motherboard (which currently costs more than the CPU). And, it only works with DDR3 memory.



vote now
Buzz up!
Santa Clara (CA) - Today, Intel took the embargo off its latest Core architecture revision, Nehalem (pronounced Nuh hay lem). The reviews are in and Core i7 is a force to be reckoned with. Perhaps Bert Toepelt of Tom's Hardware sums it up best, "Put bluntly, you'd need two and a half Phenom X4 processors to compete with Intel's current Core i7 flagship model." With hyperthreading's eight virtual cores, a 25% increase clock-per-clock over Core 2, a new on-die memory controller and lower power consumption, does it have a downside? There are no pin compatible models to Core 2, meaning it requires a new motherboard (which currently costs more than the CPU). And, it only works with DDR3 memory.


Nehalem - Core i7's three models

Intel released the CPU in three forms, all requiring a new socket and expensive X58 chipset to support it. These are the 920 (2.66 GHz), 940 (2.93 GHz) and 965 Extreme (3.20 GHz), for $284, $562 and $999. X58 motherboards will cost between $300 and $420. Also required is DDR3 memory, which is currently nearly twice the cost of DDR2. All models support a 133 MHz memory clock, with 920 and 940 using a 6x or 8x multiplier allowing for DDR3-800 and DDR3-1066. 965 allows 10x and 12x and DDR3-1333 and DDR3-1600.

Intel has introduced a new "Overspeed Protection" feature which prevents big overclocking. By monitoring both core voltage and amperage, Intel effectively caps the maximum power at 130 watts total.

Nehalem has four physical cores, each with Hyper-Threading, allowing for eight virtual cores. It has 32KB + 32KB instruction plus data caches, a 256 KB L2 cache, 8MB L3 cache shared equally among all four cores, and SSE4.2.

All 8 MB are available to any and every core virtual core, allowing for single-threaded apps to use a full 8 MB L3 cache if need be. In addition, multi-threaded apps working on the same data set can have more data in the cache because apps using the same data on different cores don't have to have things duplicated or synchronized. This also speeds up performance by reducing latency through inter-die cache coherency. Cache latency is reported at 4 cycles for 64KB L1, 11 cycles for 256KB L2 and 39 cycles for 8MB L3.


Overclocking made hard

Only the 965 Extreme edition is unlocked for overclocking. Intel promises that these high-end models will always remain unlocked. In the past, many enthusiasts would purchase low-end CPUs, high-end cooling solutions, and then overclock their chips to achieve Extreme-or-greater performance for a lower cost. Intel has now put a stop to that practice with Overspeed Protection, which puts a hard cap at 130 watts power consumption. Beyond that it just won't go.

Testers were able to get the 965 Extreme overclockd to 3.80 GHz (18.75% overclock), though the performance of 96 Extreme is already so high it didn't make much difference.


Core i7, Hyper-Threading and transistor count

Hyperthreading presents two virtual cores per physical core. Testing shows this often increases performance in multi-threaded applications, but without the expense (both die expense with its heat requirements, as well as physical expense to consumer) of having additional physical cores. There are some instances where a small penalty of a couple percentage point drops in performance is observed. Overall, the gains range in excess of 5% on average with some benefiting by 15% or more.

Despite the 45nm Core i7 having eight virtual cores and a larger physical die size, it has less transistors (731 million) compared to Core 2 at 45nm (820 million, or 2x 410 million). Core i7 also has 1366 pins, compared to Core 2's 775. These extra pins support the Quick Path on-die memory controller.


Integrated on-die memory controller and SSE4.2

The new requirements from Core i7's on-die memory controller mandate that a new motherboard and chipset are required. These support up to 15.4 GB/s read throughput, 13.8 GB/s write throughput, and 19.4 GB/s copy throughput with an amazingly low 34.3ns latency (compared to 68.8ns on Core 2 and 56.3ns on AMD's Phenom X4). This advantage is clearly evident on memory benchmarks which absolutely set a new level of performance.

The full implementation of SSE4 is now included, with new parallel instructions PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM, PCMPGTQ, as well as POPCNT and CRC32.


Performance

Core i7 does not disappoint in any area. The reviews show us that this latest processor sets a new standard of performance that's so high, AMD might not be able to ever catch up in our opinion.

On every benchmark that is CPU-intensive and not externally compute-limited, such as the 3D graphics card being the limiting factor, Intel's Core i7 not only dominates, but actually makes things embarrassing for AMD and in some instances Core 2. Scores which literally double those (or more) of AMD's Phenom X4 9950 Black edition are often seen.

3DMark Vantage CPU shows the 920, 940 and 965 at 16352, 18264 and 19996 without overclocking. Overclocked the 965 sees 23460. AMD's Phenom X4 9950 shows 8777. SiSoft's Sandra Arithmetic Dhrystone shows 920, 940 and 965 at 6770, 74853 and 81165, compared to Phenom X4 9950's 34570. Whetstone shows 920, 940 and 965 at 61400, 66045 and 72778, compared to Phenom X4 9950's 33661.

Crysis 1680 x 1050 shows 140.9, 150.6, 162.7 frames per second on 920, 940 and 965, compared to 107.8 on Phenom X4 9950. Unreal Tournament 3 1680 x 1050 shows 136.9, 146.8 and 154.5 on 920, 940 and 965, compared to 106.7 on Phenom X4 9950. World in Conflict 1680 x 1050 shows 172, 196 and 218 on 920, 940 and 965, compared to 103 on Phenom X4 9950. Supreme commander shows 32.55, 33.78, 35.20 on 920, 940 and 965 compared to 30.77 on Phenom X4 9950.

Read reviews at Tom's Hardware Guide, Anandtech, TweakTown, PC Perspective and Hot Hardware and their video spotlight.


Conclusion

If we use the Phenom X4 9950 as a baseline calling it 100%, then we can show the relative overall levels of performance observed on 920, 940 and 965. At Tom's Hardware we find 156%, 170% and 191% (focused mostly on compute). At Anandtech we find 148%, 159% and 170% (focused monstly on gaming).


UPDATED: November 3, 2008 11:02am
Core 2 allowed full 4MB L2 cache to be used by one or both cores. Nehalem allows 8MB L3 to be used by one to all four cores (eight virtual cores).
Go Intel!!! Looks like AMD lost this round also.