Can AMD push Intel outside?

Pro

1 April 2005

For some observers, AMD’s progress to the point where its Opteron 64bit processor is now a serious challenge to Intel’s Itanium is the ultimate Cinderella story. From such a viewpoint, AMD is the plucky little startup that challenged the might of the world’s largest chip company and eventually got to play in the same ball park.

It’s a matter of utter hubris to say that AMD has ‘won’ in any meaningful sense; it is still only a fraction of the size of Intel and its efforts are spread over a much wider section of the chip industry than Intel which concentrates on core technology for the computer market.

Nevertheless , the fact that the Opteron chip has attracted custom from such computer giants as IBM, HP and even Sun Microsystems is evidence that the company once dismissed by Intel boss Andy Grove as one whose ‘last innovative thought was to copy Intel’ is at last to be thought of as a serious competitor.

 

advertisement



 

In fact Intel and AMD go way back as partners. In the days when Intel was itself a struggling startup it was required by its customers, the large computer makers of the day, to enter into ‘second-source agreements’ with rival chip makers. The underlying theory being that if Intel went bust, at least the computer companies would have an alternative supplier of key components at the ready. One of Intel’s partners was AMD, but the relationship went very sour in the mid 1980s when, with the development of the 80386 processor, Intel felt sufficiently strong to dispense with its second-source arrangements and assumed sole responsibility for making its own chips.

Years of litigation and ‘clean-room cloning’ of Intel devices followed by AMD and others. Even with the resolution of the legal situation a decade ago, AMD concentrated its efforts on the high volume but lower-priced end of the market which Intel was happy to leave as it brought out ever more powerful processors for the more lucrative end of the market, and built sufficient manufacturing capacity (not least at Leixlip in Kildare) to cut AMD’s profitability to the quick. In fact, AMD has had only one profitable year since 1996 and has accumulated losses in excess of $1bn in the intervening time.

However, AMD’s persistence finally appears to be paying off. After decades of aping Intel architectures, the AMD64 architecture, rooted in Opteron and Athlon 64 processors, has actually been imitated by Intel in the form of Nocona, Intel’s 64bit version of Xeon. In a stunning reversal of fortune, Intel was forced to build that chip because Opteron was invading a server market that the Intel Itanium was supposed to dominate.

Now with Intel mad as hell and hot on AMD’s heels, can AMD grab enough sales traction to hold up to the punishing onslaught everyone knows is coming?

Anyone shopping for servers needs to consider that question seriously because Opteron creates a new 64bit path to follow-one that continues the x86 tradition rather than, as Itanium does, consigning that architecture to the dustbin of history. To understand the crossroads at which AMD finds itself and what the implications are, shrewd observers must take a hard look at the company’s technologies and market position now and in the past.

My friend, my enemy
Historically, AMD tended to concentrate on chips that could be ‘drop-in’ replacements for Intel processors, ie ones that could fit into standard PC boards and use standard support chipsets. With the introduction of Athlon in 1999, AMD showed real promise that it could make it on its own. Athlon was not a drop-in replacement for any Intel processor. It required its own motherboard and chip set, which, although software-compatible with Intel’s, forced manufacturers to alter their production lines and testing procedures.

That gave Intel an easy way to keep AMD in its cage. Intel went to its OEMs-which it controlled through allocated distribution of parts-to discourage them from making the investment in retooling for Athlon-compatible products. Rather than risk supply problems with Intel, the majority of manufacturers effectively froze AMD out. And when Microsoft got serious about its enterprise push with the release of Windows 2000 Server, Intel had Pentium II, Pentium III, and Xeon ready to rush the entry-level server space that Unix players Hewlett-Packard , IBM , and Sun Microsystems had neglected. Intel was madly grabbing IT unit share, and although AMD wanted a piece of that, there was no way to blunt Intel’s advance.

AMD’s strategy for penetrating Intel’s blockade was an odd one. Small VARs and individual system builders were strong proponents of choice and were in dire need of a stable supply of chips. The Web sites that served the enthusiast communities, which included gamers and PC technophiles, were no fans of the one-supplier market.

These small-fry operators built their systems from components, and motherboards made by Asian manufacturers were standard fare in custom-built systems. AMD made a motherboard of its own as a reference implementation and shipped it to every online source that might give Athlon some exposure. Soon these sites were benchmarking Pentium II and Pentium III against Athlon, and as AMD knew it would, Athlon kept up with Intel’s CPUs cycle for cycle and wouldn’t quit.

In Athlon, the grassroots IT community saw an alternative to Intel’s tightly controlled lineup. It pressured Asian motherboard manufacturers, and a crack in the wall appeared when a lone motherboard maker stepped forward and then quickly retreated after what was assumed to be a tap on the shoulder from Intel. But it was the grassroots, low-volume builders that kept AMD motherboard makers in business. One by one, companies such as MSI, Supermicro, and Tyan bowed to system builders’ demand for choice.

As motherboards began to show up, system builders, including midvolume manufacturers, were overjoyed to see faster processors, faster memory, faster bus speed, and more solid overall design than they could afford from Intel. Athlon proved consistently fast and stable. Compatibility was eliminated as an issue. And AMD earned its entry server credentials with the solid performance of AMD’s dual-processor CPU, Athlon MP, showing AMD could break through the ceiling Intel tried to impose.

Hammering out partnerships
To get out of the bargain server market, AMD bet its life on Hammer, its first total system architecture that wasn’t a carbon copy of Intel’s. The details of AMD’s Hammer processor were published well in advance of its unveiling. But Intel did not suspect that the processor was the tip of a wedge. AMD built a whole-market strategy around Hammer with high-powered partnerships, inventive marketing that ignored Intel’s existence, strong engagement with grassroots and commercial developers, and the brass ring: Windows.

In April 2003, AMD rolled out its first Hammer processor, the 64bit Opteron. At the very instant Hammer became Opteron, Microsoft lifted it into parity with Itanium on its 64bit Windows road map. The commitment from Microsoft to build non-Intel architecture into 64bit client and server editions of Windows was the grail that AMD had been chasing ever since Intel tore up its contract.

In just more than a year, Opteron grew from a new dual-processor technology with one first-tier OEM to a quad-processor technology with HP, IBM, and Sun signed on. Sun is committed to a 64bit edition of its Solaris operating system for its growing lineup of Opteron systems. AMD is on target for eight-processor Opteron and has already extended its 64bit technology down to notebooks, desktops, and workstations. AMD has raised the performance of and grown its 32bit, Athlon-based processor line as well, creating new options at all levels from dual-processor servers down to value home PCs. Not only has AMD stopped waiting for Intel, it has created a broader product line and is now encroaching on Intel’s unit sales.

Where AMD stands with IT
Today, IT operations can safely and affordably purchase Opteron servers as upgrades to Windows Xeon servers. And benchmarks have shown that 32bit Windows applications that rely heavily on I/O and smooth scalability will get an immediate and substantial kick from Opteron. In addition, 64bit Windows and the development tools that follow will raise 32bit enterprise application performance and capacity up another notch, making way for 64bit optimised server software. This is the first time IT has been offered a three-stage migration path-32bit, 32bit applications on a 64bit operating system, and pure 64bit-on a single architecture; none of these steps requires so much as pulling a server out of the rack.

Those companies, organisations, and technologists leery of buying on potential can wait-Opteron will only get faster over time-or they can climb into pure 64bit Linux PC servers now. AMD64 support has been merged into the Linux kernel tree for some time, so most free and commercial Linux distributions, including Suse (now part of Novell), Mandrake and TurboLinux already run on Opteron in 64bit mode. Version 5.2.1 of FreeBSD is also running on Opteron. The full GNU suite of Linux development tools compile and debug Opteron code.

The toughest challenge AMD faces now is not surviving Intel’s competition or ramping up supply; it’s convincing businesses they want 64bit capabilities anywhere but in high-performance servers. It’s an easy sell for developers targeting Opteron, but beyond that, the benefits aren’t obvious. Many will be enlightened by 64bit Windows. On the desktop, gamers and enthusiasts will once again assume the role of convincers.

During 2005, AMD’s strength and influence should grow, as new fabs and processes come on line. The 90-nanometer process that AMD already has on line will be used to boost the performance and reduce the power consumption of Opteron, Athlon 64 desktop and mobile CPUs, and the Athlon 64 FX performance desktop product lines. In the second half of 2005, AMD plans to exploit the new process to bring dual-core (two CPUs on one die) Opteron and Athlon 64 FX processors to market. With Opteron’s direct CPU-to-CPU interconnects, servers built with dual-core AMD processors can house 16 CPUs in one chassis with no logic-or associated overhead-to slow traffic between processors.

Intel has remade itself more than once and won’t allow itself to stay, if only in perception, a step behind AMD. But AMD doesn’t have to kowtow to Intel now. And IT will never again be forced to submit to the price and technological power that Intel unilaterally decides the market needs.

Intel strikes back with Nocona
Infatuated with Itanium, Intel has long resisted the obvious: creating a 64bit chip that simply extends the x86 architecture on which the company built its fortune. Instead, Intel has ceded that ground to AMD whose 64bit, x86-compatible Opteron has steadily eaten into the market shares of both Itanium and the 32bit Xeon throughout the past year. For many, Itanium has been too expensive and too much of a departure. And Xeon has lacked the 64bit headroom provided by Opteron.

Intel’s latest Xeon processor, code-named Nocona and announced in June, promises to change the game. It was released with little fanfare, but in fact, it’s a major departure given that Intel has actually adapted the x86-64 instruction set that AMD developed for the Opteron. Intel’s new architecture, the EM64T (Extended Memory 64 Technology), brings a Xeon chip into direct competition with the Opteron for the first time.

The newly released CPU will take a while to make its mark in the 64bit computing world because all x86-64 code written so far has been for Opteron. Software tweaks and optimisations are in order to permit recently developed x86-64 code to run on the processor. For example, the 64bit beta version of Windows Server 2003 will not run on this chip, as it’s been coded for the Opteron and will not install if CPU detection fails to find Opteron. This will be remedied soon, no doubt, but similar issues will abound for some time. Similarly, Red Hat Advanced Server Update 1 will not install on an EM64T system, but Update 3 will because EM64T support has been added.

On the surface, Nocona’s improvements appear typical of Intel’s dual-processor Xeon evolutionary tradition: a higher clock speed and a faster front-side bus. But this time, in a departure from its usual formula, Intel has added several noteworthy twists intended to stem customer and OEM defections from Xeon to AMD’s fast-tracked Opteron.

As a 32bit x86 processor, Nocona is a killer that instantly makes obsolete its predecessor, Xeon DP. Nocona is manufactured using a 90nm (0.09micron) process rather than Xeon DP’s 130nm process, allowing Intel to pack more transistors into a smaller space and to drive the chip at a lower voltage. This helps to offset the heat and power draw associated with higher clock speeds. Intel also exploited the extra real estate by raising Level 2 cache size to 1Mbyte from Xeon DP’s 512kbyte. Bumping up the size of the Level 2 cache allowed Intel to remove the Level 3 cache it had incorporated in late-model Xeon DP processors. It is the doubling of Nocona’s Level 2 cache-which runs at the CPU’s full clock speed-that will have the greatest impact on the performance of Xeon-optimised applications.

At 16kbyte, the size of Nocona’s Level 1 data cache is also twice as large as Xeon DP’s. Although seemingly small, this cache is vital because it sits closest to the chip’s execution units. The Level 1 cache is critical for Nocona to be capable of performing multiple parallel operations per clock cycle, which will also have a noticeable impact on performance.

Nocona’s execution pipeline-that is, its queue of operations awaiting execution-is 31 stages long, up from Xeon DP’s 20. Critics point to Xeon’s long pipelines as evidence of the inefficiency of Intel’s x86 designs. The pipeline holds not only operations that are certain to be executed but also those operations that the processor predicts will be executed as the result of conditional instructions-for example, a branch taken when a register’s value is greater than a specific number. Operations sitting in the pipeline are executed very rapidly, so when Nocona predicts the execution path correctly, the processor’s performance is astounding. But when its predictions fail, the pipeline has to be flushed and refilled from scratch, a process that hinders the chip’s performance.

The Nocona feature that has grabbed the most attention, Intel’s new EM64T (Extended Memory 64 Technology), might be the least interesting. EM64T breaks the 4Gbyte RAM barrier associated with all x86 processors (except Nocona, Prescott, and AMD’s Athlon 64, Athlon 64 FX, and Opteron chips). Basically, Intel created a hack for using chunks of memory above the 4Gbyte mark that will effectively reduce the system’s contiguous address space. Yes, developers will be happy to see a big address space for their new 64bit applications, but Intel’s system architecture will likely hamper, not improve, performance as more RAM is added to the system.

In the end, Nocona is a Xeon killer not an Opteron killer. (See panel ‘It’s the system, stupid’) The fine points of chip architecture aside, differences at the system level relegate Nocona to the dual-processor bush league. For example, Intel could not match the glueless I/O system. Processors in an Opteron server talk directly to one another without going through the silicon intermediaries forced by Intel’s design. Likewise, Opteron links each CPU directly to memory, and Opteron systems’ memory bandwidth scales upward with the number of CPUs. Nocona’s bandwidth remains static.

Nocona’s design is Intel’s best yet. It’s a beautiful Xeon. But to catch Opteron, Intel would have to ditch its time-honored system architecture. Given where Itanium has (or hasn’t) gone, and the incredible pace at which AMD is advancing its technology, Nocona looks like a well-manicured rest stop alongside the very bumpy road that Intel faces.

Opteron vs. Nocona: It’s the system, stupid
If you think AMD’s Opteron and Intel’s Nocona-or more formally, ‘Xeon Processor with 800MHz System Bus’-are cut from the same 64bit cloth, look closer. Yes, they’re compatible at the instruction-set and register levels; they should be because they’re both based on AMD’s x86-64 specification. But the total system architecture surrounding these chips-which includes pathways to other CPUs, memory, and peripherals-exhibits several differences that factor into buying decisions and developers’ platform targeting.

At its core, Nocona is a NetBurst Xeon DP, a Pentium 4 equipped for dual-processor operation. It has 1MB of Level 2 cache and a top clock speed of 3.6GHz. All memory and I/O data, interrupts, interprocessor communication, and address requests flow over a fast shared bus with a maximum bandwidth of 6.4Gbyte/s. It’s a highly evolved design, on the leading edge while remaining faithful to the legacy design principles that Intel is expected to maintain.

The 64bit technology common to both processors is easy to explain: more memory and more registers. When you’re running a 64bit OS, standard PC caps on physical and virtual memory go away. (Well almost: Opteron has a larger total address space than Nocona, but Nocona can accommodate twice as much physical memory as a current dual-CPU Opteron system: 32GB vs. 16GB.) Registers are the fastest type of storage a CPU has. The more registers you have and the more bits each register holds, the more compilers can optimise application performance. Having more registers and using them well can also improve the speed and smoothness of task switching, which has an effect similar to that of Intel’s Hyper-Threading technology.

Beyond the instruction set and address space, however, these two processors have nothing in common. And where they diverge most is in their total system architectures.

As with all Xeons, Nocona’s shared bus is the Achilles’ heel of Intel’s architecture. That only gets worse with SMP systems in which multiple CPUs must funnel their data, I/O, addresses, memory access, and interprocessor communication through a single bus and compete for access to a single pool of memory.

There are two ways to improve a shared bus: Make it faster or divide it up into independent buses. Intel sped things up, raising the bus speed from 533MHz to 800MHz, and tossed out a hint that it’s going after the independent bus design. The new touch is PCI Express. The chip that directs traffic on Xeon’s shared bus now has three onboard serial communications channels, each of which has a theoretical maximum throughput of 4Gbyte/s. Nocona can’t touch the channels’ aggregate potential of 12Gbyte/s with a 6.4Gbyte/s shared bus, but faster buses will inch it closer to that limit.

By contrast, Opteron implements as many as four independent high-speed buses on each processor, depending on the model of the CPU. One bus on each processor is dedicated to memory traffic, with a maximum bandwidth of 6.4Gbyte/s. The Opteron architecture gives each processor its own bank of memory, so theoretically, bandwidth rises and contention decreases as more processors are added to a server.

Communication with nonmemory system components, including other processors and peripherals, is handled by HyperTransport bus controllers built into the CPU. This parallel bus, developed by AMD and licensed by others, including Apple and Transmeta, has a bandwidth of 6.4Gbyte/s (3.2Gbyte/s each way), for a total potential system bandwidth of 19.2Gbyte/s, independent of memory traffic. Direct HyperTransport links between CPUs allow all processors to share all the system’s memory, split though it is across processors, at full speed.

The bottom line is that the Opteron architecture with HyperTransport set the stage for blazing multiprocessing performance. And at this stage, Intel’s Xeon line has nothing to match that.

22/11/04

Read More:


Back to Top ↑

TechCentral.ie