Intel Pentium 4 and NetBurst Micro-Architecture

Course Paper
Advanced Computer Architecture: CS6810
Al Davis
3 December 2002

Jason Waltman, MS Graduate Student
School of Computing
University of Utah

All rights reserved.  No part of this report may be reprinted or reproduced without permission in writing from the author.


Intel Pentium 4 - NetBurst.pdf
603 KB


1   Introduction

Intel Corporation (INTC), founded in 1968, developed the first microprocessor in 1971. Approximately 75% of all computers in use today use Intel processors, making Intel the world’s largest microprocessor manufacturer. Intel has 45 offices worldwide, 80,000 employees, and reported $26.5 billion in revenues in 2001.

Intel’s current microprocessor is the Pentium 4. The processor is uses a redesigned micro-architecture named ‘NetBurst’ by the Intel marketing group. NetBurst reflects the end of the P6 micro-architecture era which began with the Pentium Pro in 1995 and was used in all later Pentium versions through the Pentium III until now.

This paper will discuss the architecture of the Pentium 4 (P4). I will give only a brief overview of the IA-32 instruction set architecture (ISA) before discussing the more interesting new NetBurst micro-architecture, including details about the P4 memory and I/O architectures. I will conclude with some analysis of the Pentium 4, specifically, with what I think is good and maybe not so good about the current design and recent Intel design decisions.
 


2   IA-32 Instruction Set Architecture

The IA-32 (32-bit Intel Architecture) ISA used in the Pentium 4 is the same base ISA that was used by Intel in the first generation 80x86 processor, the 8086, in 1978. (Even though the 8086 was a 16-bit architecture Intel uses the name IA-32 to refer to both 32- and 16-bit architectures since they are compatible.) For compatibility with legacy applications the ISA remains today. This means that any application written for the 8086 over 20 years ago, will run on the Pentium 4 and all 80x86 IA-32 processors in-between.

IA-32 is a CISC-like, general purpose register architecture. The complex instruction set contains over 300 instructions. Beginning in 1995 with the P6 micro-architecture and Pentium Pro, Intel changed to using a more RISC-like core, adding a front-end decoder that would translate IA-32 instructions to simple micro-ops that the execution units would process (an IA-32 instruction is made up of one or more micro-ops).


Figure 1: Intel Pentium 4, 2.4 GHz, 131mm2 die.

As a result of its humble beginnings, IA-32 has only 6 general-purpose architected registers (8 if you count the reserved stack and base pointers), 6 segment registers, a status flag register and an instruction pointer. These registers were each 16-bits (one word) in the 8086. Some registers were extended to 32-bits with the 386 processor and later additions to the instruction set added 64- and 128-bit registers for MMX/SSE(2). IA-32 is a byte-addressable, little-endian architecture and can address 64 GB of physical memory. Logical IA-32 addresses are a combination of a 16-bit segment register value and a 32-bit offset. Linear addresses in a segment are further broken down into pages.

IA-32 contains all the common instructions you’d expect in a ISA, including instructions for: data transfer, arithmetic, logical operations, bit shift/rotate, control transfer, string manipulation, et cetera. A few common instructions semantics traditionally vary on the 80x86 from other architectures. For example, IA-32 compare instructions set a flag register that branch instructions look at in determining whether or not to jump. A call instruction saves the current IP on the stack; a return pops the IP from the stack.

Instructions can have 0-2 operands, one may be a memory location, and the others may be constants or registers. Some operands like division and multiplication use two registers for their operands. For example, the dividend of division is two (specific) registers and the quotient and remainder end up in the same two registers.
IA-32 has six addressing modes. For brevity, I won’t explain them, but their meaning should be relatively clear from their names: immediate mode, register, direct, register indirect, base+offset indirect, index+offset indirect, and base+index+offset indirect. IA-32 has three execution modes: protected mode (native processor state), system management mode (OS use for context saving and address space switching), and real-address mode (the original 8086 processor environment).
 


3   Intel NetBurst Micro-Architecture

Details of the NetBurst Micro-Architecture—the main focus of this paper—were released to the public in late 2000. The Pentium 4 and current Xeon processors are the first to use the new micro-architecture. Intel’s goals for NetBurst, the successor to the P6, were two-fold: (1) to be able to execute legacy IA-32 and SIMD (executing a single instruction across multiple data) applications and (2) operate at high clock rates that will scale easily in the near future. Intel markets the features of NetBurst that attempt to reach these goals as the following:

   · Hyper-Pipelined Technology
   · 533 MHz Front Side Bus
   · Advanced Dynamic Execution
   · Rapid Execution Engine
   · Execution Trace Cache
   · Advanced Transfer Cache
   · Streaming SIMD Extensions 2 (SSE2) instructions
   · Hyper-Threading Technology

Rather than divide this section into the above categories, I’ll talk about the architecture in terms of processor instruction execution and defer some details of the marketing terminology to later sections.


Figure 2: Intel NetBurst 20-stage Pipeline.

First, however, the “Hyper-Pipelined Technology” refers to the 20-stage pipeline of the NetBurst micro-architecture (see figure 2). This is twice as long as the pipeline on the P6 micro-architecture and is the primary reason Intel is able to get such fast clock rates (if less work is being done on each clock tick, clock ticks can happen faster). Intel claims that the 20-stage pipeline will allow them to reach 10 GHz clock frequencies in the future without another micro-architecture change.

NetBurst instruction execution is broken into three main parts: an in-order issue front end, an out-of-order (OOO), superscalar execution core, and an in-order retirement. The job of the front end is to feed a continuous stream of micro-ops to the execution core. The fetch/decode unit can decode one IA-32 instruction per clock (complex instructions are looked up in a microcode ROM) and passes micro-ops to the Execution Trace Cache (described later). The trace cache passes micro-ops to the execution core and can issue 3 micro-ops per cycle.

NetBurst’s “Advanced Dynamic Execution” is via an OOO, superscalar execution core. Up to 126 instructions may be ‘in-flight’ at once. Forty-eight of these may be loads; 24 may be stores. The core can dispatch up to 6 micro-ops per cycle and uses register renaming (the Pentium 4 has many more than 8 general purpose registers) instead of a reorder buffer which was used in the P6 micro-architecture. In this way, if one instruction is delayed waiting on data or a contended resource, other instructions that have available resources may execute out-of-order. Stores may be forwarded to dependant loads before being written to memory; loads may be speculative (but cannot cause a page fault).

The execution core has seven execution units accessible through one of four issue ports. One or more ports may be issued on each clock cycle (see figure 3) and ports may then dispatch one or two micro-ops per clock. The possibility of dispatching two micro-ops per clock should come as a surprise. Two integer ALUs on the P4 actually run at twice the core processor frequency (this is what Intel calls the “Rapid Execution Engine”) and can execute in half a clock cycle. Two instructions are issued by sending one in the first half of the cycle and the second in the second half (if, of course, the port has access to these “double speed” ALUs.


Figure 3: NetBurst Issue Ports and Execution Units.

Retirement in NetBurst is done in-order to insure that system state is left as the programmer intended after execution of an instruction. Exceptions may be raised as instructions are retired and therefore cannot occur speculatively. Up to 3 micro-ops can be retired per cycle.

The branch predictor in the Pentium 4 is more advanced than in any other Intel processor. The branch delay on a correctly predicted branch could be as little as zero clock cycles; a mis-predicted branch costs on average 20 cycles (the length of the pipeline). The predictor predicts all near branches including calls, returns, and indirect branches. The predictor does not predict any far branches. The predictor is dynamic, receiving its information for previous branches from the retirement logic into a 4 KB branch target buffer (this is 8 times larger than the Pentium III’s). If no dynamic information is available for a branch, a static predictor which predicts backward branches as taken, is used.

Prefetching in the NetBurst micro-architecture happens automatically by the hardware by bringing data or instruction cache lines into L2 based on prior reference. Basing prefetching decisions on prior reference is a new feature. The hardware can also do normal linear prefetching of instructions. Also, SSE2 (see below) adds instructions that can allow a programmer to request a data prefetch using software for code with irregular access patterns.

NetBurst adds the next level of multimedia extensions to the 80x86 processor line. SSE2 is an additional 144 instructions on top of Intel’s MMX and first set of Streaming SIMD Extensions. In part, these instructions add 128-bit SIMD integer and double-precision floating point operations as well as support for new cache and memory management operations.
 


4   Hyper-Threading Technology

Less than one month ago, on 14 November 2002, Intel released a 3.06 GHz Pentium 4 with “Hyper-Threading Technology” enabled. Hyper-threading (HT)—the ability of a processor to execute more than one thread at a time and appear to the OS as two logical processors—was actually on the P4 die for awhile as the additional logic is less than 5% of the total die size. Had Intel been ready to go public with HT before now we would’ve seen it in an earlier P4 incarnation. HT works on the theory that usual processor utilization is only about 35% of maximum. By sending two threads into the processor at a time, would-be idle execution units can be used by a second thread. Intel claims that with HT enabled, processor utilization can increase to 50%. The interesting thing about HT is that the execution core and memory hierarchy don’t have to change at all in order for it to work. The trace cache is shared between the two threads; both the trace cache and retirement logic alternate between threads so both logical processors can make forward progress. HT needs OS support in order to work properly. Both Windows XP and Linux 2.4.x support HT. Older OSs will have problems with HT enabled because the system BIOS will report a P4 with HT as two processors—the OS won’t be able to distinguish between a logical and a real processor.
 


5   Pentium 4--NetBurst Memory Details

NetBurst supports caches, TLBs (translation lookaside buffers), and a store buffer for temporary instruction and data storage. The memory architecture is byte addressable, uses segmented-paged addressing (a segment is an independent address space, e.g. to separate code and stack), and three page sizes (normal is 4 KB; “large” are 2 MB or 4 MB).

The Execution Trace Cache (TC) is one of the more interesting additions to the NetBurst micro-architecture. The idea behind the TC is simple. The IA-32 decoder has one of the highest gate counts of all pieces of logic on the P4—which means decoding takes a long time. The traditional decoder has to run every time an instruction is encountered and even worse, also on mis-predicted branches. The TC stores decoded micro-ops in program execution order (it can store 12K micro-ops) so they don’t have to be decoded again. The TC works with the branch prediction hardware to build logical “traces” of micro-ops over predicted branches. When instructions are encountered that are in the TC, a trace is sent to the execution core directly, without having to be decoded. The P4 TC is 8-way set associative and can deliver up to 3 micro-ops per clock. This replaces a traditional L1 instruction cache.

The P4 L1 data cache is 8 KB (half the size of the Pentium III’s), 4-way set associative, and has 64-byte cache lines. The L1 interface has two 64-byte channels and can handle one load and one store per clock. L1 data load latency is 2 cycles.


Figure 4: NetBurst Memory Hierarchy.

Depending on how recent the processor (specifically, if it’s manufactured with a 0.18- or 0.13-micron process), the P4 has either a 256 or 512 KB unified, on-chip L2 cache. The L2 has a 256-bit interface to the execution core and can transfer data on each cycle. The L2 is non-blocking, full (clock) speed (thus the name “Advanced Transfer Cache”), 8-way set associative, and has 128-byte cache lines. The total load latency from L2 is 7 cycles. Data is received from the northbridge in 64-byte pieces.

The NetBurst memory hierarcy levels are non-inclusive, that is, a line in an upper level does not imply a line in a lower level. Replacement is pseudo-LRU (least recently used). L1 is write-through (buffered); L2 is write-back.

TLBs store most recently used page-directory and page-table entires. The P4 instruction TLB has 128 entries and is 4-way set associative; the data TLB has 64 entries and is fully associative.

The P4 store buffer allows writes to be delayed for more efficient use of memory-access bus cycles. The store buffer has 24 entries.
 


6   Pentium 4--NetBurst I/O Details

The advertised front side bus (FSB) on the P4 is 400 or 533 MHz (depending on the age of the processor). This is the effective speed of the bus, but actually, the FSB is “quad pumped” and running at only 100 or 133 MHz. A quad pumped bus means that data is transferred four times per clock—twice on the rising edge, twice on the falling edge. At 100 MHz there is only 10ns between clock cycles; detecting 4 different voltages in this short amount of time is quite a feat. Shortening this time to 7.5ns (with 133 MHz bus) is even more incredible and translates to incredible bandwidth (4.2 GB/s). A buffering scheme is used to sustain 533 MHz data transfers.

Intel released the P4 with their i850 chipset (which is similar to the Pentium III’s i840). The i850 supports 4X AGP, Ultra ATA/100, dual channel RDRAM, 24 MB/s USB bandwith (2 controllers over 4 ports), 6 channel audio, and a LAN Connect Interface. In addition, the i850 was designed to support the P4’s quad pumped FSB. The choice to support only RDRAM with the i850 caused quite a stir in the microprocessor following community. The faster (due to dual channel support—requiring RIMMs in pairs of two), more expensive RDRAM Intel chose nonetheless to pair with their incredibly fast FSB. Since the processor’s introduction in 2000, 3rd party chipset designers as well as Intel have released controllers that work with various types of SDRAM, including the popular DDR-SDRAM.


Figure 5: From left to right: Socket-423 Pentium4, Socket-603 Xeon, Socket-478 Pentium 4.

The P4 has either a 423 or 478 pin interface to the motherboard. The 478 pin interface was introduced with the 2.0 GHz version of the processor with a more densely packed arrangement of pins called a micro Pin Grid Array making the package physically smaller with a smaller motherboard footprint (see figure 5).
 


7   Pentium 4 Implementation

The first Pentium 4 processors were released in November 2000 at 1.4 and 1.5 GHz using a0.18-micron fabrication process and 256 KB L2 cache. Since that time, the 20 stage pipeline of the NetBurst micro-architecture, and the move to a 0.13-micron process has allowed Intel the speed scaling they had hoped for. The first 0.13-micron process (see figure 6) P4 was released in January 2002 at 2.0 GHz (and only available with the 478 micro pin interface). The smaller transistors allowed room on die for a bigger L2 cache, which Intel doubled from 256 KB to 512 KB. In April 2002, Intel made the switch from 200mm wafers to 300mm wafers (see figure 7) with their new plant opening in Hillsboro, Oregon. In May 2002 Intel released the 533 MHz FSB and a new i850E chipset to support the new bus speed increase. At launch two years ago, the Pentium 4 had 42 million transistors, a 217mm2 die, thermal dissipation of 52W and VCore voltage of 1.75V. More recent versions of the processor have 55 million transistors (due to the L2 increase), a die size of 131mm2, thermal dissipation of about 68W and VCore voltage at 1.5V. The most recent Pentium 4 release by Intel was on 14 November 2002, the 3.06 GHz processor with hyper-threading enabled.


Figure 6 (top): Close up of 0.13-micron Pentium 4 die.
Figure 7 (bottom): A 300mm Pentium 4 wafer.
 


8   Anaylsis

Personally, I think that any current Intel processor is an amazing device. The fact that they’ve been able to build a microprocessor on top of the 8086 ISA—a design that was never meant to be used the way it is today—is simply amazing. Luckly for Intel, the 8086 ISA was general enough to be able to morph it into today’s high performance processors. The continued popularity of the 80x86 processors is due to two facts that consumers have grown to expect: compatibility with older applications and a continuous increase in speed.

8.1   Performance

When first released, the Pentium 4—in certain situations—was actually slower than the then-fastest Pentium III; it was consistently slower than AMD’s then-best processor as well. While sort of an embarrassment for Intel in 2000 I believe that they understood the ramifications of what they were doing and what the NetBurst micro-architecture would be able to excel at in the near future. Computer geeks and hardcore computing consumers like controversy. They want some company to hate, they want the underdog to win sometimes; they don’t like things to remain static. Intel was overdue for a new micro-architecture and the computer geeks knew it. AMD’s results kept getting better and Intel needed to get the spot light off of them. One could argue that delivering a new ‘slow’ processor might shine the wrong color light on Intel, but I don’t believe so. Intel’s goal was to just get the processor out and get a few people to use it. If the marketing hype was good enough, consumers wouldn’t ever realize the speed issue.

NetBurst was designed, not to deliver a killer processor in 2000 but, as a platform on which to develop the killer processor for the future. It took a few years, but in May 2002, after switching to the 300mm wafers, 0.13-micron process, 512 KB L2, 533 MHz FSB, and bumping the core clock to 2.5 GHz, the P4 was finally consistently getting better performance than AMD in all benchmarks.

AMD will respond, and soon, but Intel’s innovation with NetBurst, I’m assuming, is going to keep them in the lead for a while. Intel’s trace cache is one of the most interesting pieces of NetBurst—and something that has proved to be a big performance booster. The SSE2 extensions and recently released hyper-threading technology also have extraordinary potential. As programmers (especially game programmers) begin programming for these new technologies, code on a new Intel processor will have a great advantage over an AMD chip (until AMD can re-implement SSE2 and hyper-threading).

8.2   Execution

The 20-stage pipeline of the NetBurst micro-architecture was an interesting move. To the untrained eye, Intel’s high clock speed looks much better than AMD’s, however Intel’s not fooling any computer geek. Going to the 20-stage pipeline reduced the Pentium 4’s IPC (instructions per clock), increased the penalty for mis-predicts, and forces more record keeping (recall up to 126 OOO instructions can be in-flight at once!). Intel was not unaware of these issues and they designed around them. The longer pipeline was needed to allow future scalability (Intel estimates that NetBurst can reach clock frequencies of 10 GHz)—they simply designed around the problems the larger pipeline would cause.

One of the surprising ‘work-arounds’ was the “Rapid Execution Engine”, that is, the double speed integer ALUs. This decision (or at least something similar) was actually necessary in order to keep the initial P4s on par with the Pentium IIIs. It’s as good a solution as any as far as I’m concerned. Integer code is usually more unpredictable than floating-point code, which means that branch mis-predicts are going to happen more often with integer code. If you can double the speed of the integer code, it should hide at least some of the latency. What will be interesting to see is how far Intel will be able to keep up with the double speed ALUs. If the NetBurst architecture is supposed to get to 10 GHz, the integer ALUs are going to need to fly at 20 GHz.

Another current decision that should shine in the future is the quad pumped FSB. The Pentium III only got 1.06 GB/s bandwidth from its FSB. In comparison, the P4 gets 4.2 GB/s—the most of any current processor by a large margin. With new memory technologies that can utilize this bandwidth currently in the works, and new multimedia applications that want to use it, Intel is sitting pretty as far as bandwidth is concerned.

The new hardware automatic prefetcher that looks at prior reference patterns is also ingenious. It’s exactly the kind of enhancement programmers like to see as it’s going to improve code we’ve already written.

8.3   Caches

As I’ve mentioned before, booting the traditional L1 instruction cache in favor of an enhanced trace cache is probably one of the best things that could’ve been done to the IA-32 aging ISA. Intel followed the Amdahl-law rule of optimization, that is, optimize the thing that’s taking the most amount of time, and the decoder was definitely taking a lot of time. The TC also is a very natural progression to an L1 instruction cache. If the processor is executing micro-ops, why cache IA-32 instructions?

Of note is the fact that the branch predictor was enhanced right along with the TC enhancement. I would assume that this was absolutely necessary (and it looks that way too, since the branch target buffer is 8 times larger than that from the Pentium III) due to the fact that the TC stores decoded micro-ops in execution order across (predicted) branches. Intel claims that in practice, the branch predictor used in the Pentium 4 is so good that mis-penalties due to an incorrect trace cache execution order are quite small.

The Pentium 4 L1 data cache is small (half the size of the Pentium III). Probably this decision has to do with the small latency of a small cache. It would be interesting to see an analysis of a larger L1 with higher latency and see how that effects performance. I don’t doubt Intel’s decision, I’m sure they did their own analysis. The latency concern is more viable after the switch to a 0.13-micron process in which Intel increased the L2 size, but not L1.


Figure 8: The Intel Pentium 4, 2.0 GHz, 478-pin Processor.

8.4   RAM

When Intel released the Pentium 4, the released the i850 chipset which only supported RDRAM. The reason I believe this was done, in part, was to promote Intel’s own investment in RDRAM technology. The RDRAM advantage is two independent channels connecting the memory subsystem to the northbridge—that is twice the bandwidth of a similar technology in a single-channel design. With the new faster FSB (that could handle the increased bandwith from RDRAM) and the fear of the P4 not performing as well as consumers would like to see, Intel had good reason to use RDRAM.

However, whenever the computer geeks aren’t given a choice, they like to yell about it. And when their only choice is an expensive one, they really yell. RDRAM is expensive and thus the problem and one reason RDRAM started showing up on lists of Intel blunders along with the Pentium division error.

I don’t believe that Intel was simply trying to force the community to use their investment—I believe that they actually thought that RDRAM was a better technology, that prices would drop, and the Pentium 4 would be better off (why else would they give RAMBUS so much money?). The problem was that the prices didn’t fall and the performance wasn’t that much better. Intel ended up punting and releasing a chipset to support SDRAM, as other chipset makers did the same. Now dual channel DDR-SDRAM is competing with RDRAM and Intel has publicly committed to supporting DDR. I’m sure that this decision comes only after realizing that dual channel DDR will provide bandwidth similar to RDRAM—which is really what Intel was looking for in RDRAM in the first place.

8.5   Hyper-threading

Hyper-threading marks the beginning of Intel’s move away from mining ILP (instruction level parallelism) in favor of the un-tapped TLP (thread level parallelism) on the desktop. Again, as was the trace cache, hyper-threading seems like a natural enhancement. If the processor is only being used 35% of the time, why not take a page out of the supercomputer manuals and get the execution units to do something else? Execution units become idle—usually—waiting on memory or disk. Having two threads in execution can allow another thread a shot at the execution units while one thread is waiting. If implemented correctly HT won’t hurt, it can only help multi-threaded applications and multi-tasking environments (based on preliminary tests turning HT on in the new 3.06 GHz processor doesn’t degrade performance in standard—single threaded—benchmarks). As programmers begin coding for HT we’ll see a more significant improvement in high-performance applications.

What’s interesting is why Intel waited until now to unveil HT when it’s been on the chip for some time. Intel had a problem with HT on a pervious Xeon processor (degraded performance). I’m guessing Intel just wanted to make sure the same thing wasn’t going to happen with the Pentium 4. Given the somewhat poor processor results at release in 2000, if when released, HT failed (again) it wouldn’t be very good publicity for Intel.
 


9   A Note on Windows XP

Since the release of Windows 2000, Microsoft is no longer supporting MIPS and Alpha processors. [If you’re reading this and are surprised that Windows OSs before 2000 ran on processors other than 80x86 architectures, you’re not alone.] The decision was for performance reasons, so the OS could be tailored specifically for the IA-32 architecture. Windows XP in fact, was designed specifically for the Pentium 4 and Intel and Microsoft aren’t shy about advertising this fact (sort of a you scratch my back I’ll scratch yours thing). Particularly, XP uses the new SSE2 SIMD instructions and the fast FSB to its advantage. Windows, like the rest of the computing society, is becoming more ‘multimedia-centric’. Microsoft’s new DirectX 8 uses SSE2 and Direct3D even has a SSE2 specific math library. XP moved its GUI to GDI+ to make direct use of DirectX’s enchantments and SSE2’s 128-bit integer operation performance is exploited in XP’s Encrypting File System. With further collaborations between Microsoft and Intel, the Windows+80x86 combination should prove to continue to be a more and more reliable solution for consumers.
 


10   Summary

Intel’s Pentium 4 and NetBurst micro-architecture may have had a rocky start, however, NetBurst was designed for scalability and now, two years after its incarnation, that fact is proving to be true. New innovations such as the Execution Trace Cache, a quad pumped FSB, improved branch prediction, and hyper-threading illustrate that Intel won’t be slowing down in the processor race anytime soon. Competition from AMD and consumer reluctance to overlook details will keep future advancement alive.
 


References

INTEL CORP. 2002. IA-32 Intel architecture software developer’s manual, Volume 1: basic architecture. Available 3 December 2002: http://developer.intel.com/design/Pentium4/manuals/.

INTEL CORP. 2002. IA-32 Intel architecture software developer’s manual, Volume 2: instruction set. Available 3 December 2002: http://developer.intel.com/design/Pentium4/manuals/.

INTEL CORP. 2002. IA-32 Intel architecture software developer’s manual, Volume 3: system programming guide. Available 3 December 2002: http://developer.intel.com/design/Pentium4/manuals/.

INTEL CORP. 2002. Intel Pentium 4 and Intel Xeon processor optimization reference manual. Available 3 December 2002: http://developer.intel.com/design/Pentium4/papers/.

INTEL CORP. 2002. Intel Pentium 4 processor with 512-KB L2 cache on 0.13 micron process datasheet. Available 3 December 2002: http://developer.intel.com/design/Pentium4/datashts/.

INTEL CORP. 2002. Intel Pentium 4 processor website. Available 3 December 2002: http://www.intel.com.

RIST, O. 2001. Windows XP on the Pentium 4 processor. Available 3 December 2002: http://cedar.intel.com/cgi-bin/ids.dll/topic.jsp?catCode=CVD.

SHELBURNE , B. 1998. A brief introduction to Intel 80x86 assembler programming. COMP 255 Computer Organization Class Notes, Wittenberg University.

SHIMPI, A. L. 2000. Intel Pentium 4 1.4GHz & 1.5GHz. Available 3 December 2002: http://www2.anandtech.com/showdoc.html?i=1360.

SHIMPI, A. L. 2000. Intel’s NetBurst architecture – the Pentium 4’s innards get a name. Available 3 December 2002: http://www.anandtech.com/cpu/showdoc.html?i=1301&p=1.

SHIMPI, A. L. 2001. Intel Pentium 4 2.0GHz: the clock strikes two. Available 3 December 2002: http://www2.anandtech.com/showdoc.html?i=1524&rndr=11182002094337.

SHIMPI, A. L. 2002. AMD’s Athlon XP 2000+ vs Intel’s 0.13-micron Northwood. Available 3 December 2002: http://www.anandtech.com/cpu/showdoc.html?i=1574&p=1.

SHIMPI, A. L. 2002. Intel introduces 533MHz FSB CPUs – Pentium 4 2.53GHz. Available 3 December 2002: http://www2.anandtech.com/cpu/showdoc.html?i=1615&p=1.

SHIMPI, A. L. 2002. Intel’s Pentium 4 2.4GHz: taking the lead. Available 3 December 2002: http://www.anandtech.com/cpu/showdoc.html?i=1605&p=1.

SHIMPI, A. L. 2002. Intel’s Pentium 4 3.06GHz: hyper-threading on desktops. Available 3 December 2002: http://www2.anandtech.com/cpu/showdoc.html?i=1746.

STOKES, J. 2002. Dual-channel DDR comes to the P4. Available 3 December 2002: http://www.arstechnica.com/wankerdesk/3q02/dual-ddr.html.

VISIONARY. 2002. Intel Pentium 4 2.4Ghz review. Available 3 December 2002: http://www.vr-zone.com/reviews/Intel/P42400/.

VOLKE, F., TOPELT, B., AND SCHEFFEL, U. 2002. Single CPU in dual operation: P4 3.06 GHz with hyper-threading technology. Available 3 December 2002: http://www17.tomshardware.com/cpu/02q4/021114/index.html.

email at jasonwaltman dot com

(c) 2000-2007 jason waltman