InfoSatellite.com - Inside the AMD Hammer microprocessor

InfoSatellite.com / News

Inside the AMD Hammer microprocessor

By Ana Letícia Sigvartsen
InfoSatellite.com
October 22, 2001

 


AMD's new Hammer processor is expected to be the world's fastest 32-bits processor and make serious competition to Intel's Itanium. Unlike other 64-bit microprocessor architectures, “Hammer”´s is specifically designed to allow migration from 32-bit to 64-bit code while providing performance for both. AMD's Vice-President and Chief Technical Officer Fred Weber held a presentation of the upcoming Hammer processors at Microprocessor Forum where he revealed several details concerning the Hammer-series. Here you can see a transcript of Weber´s presentation.

AMD’s Next Generation Microprocessor Architecture

Fred Weber
October 2001

"Hammer" Goals

  • Build a next-generation system architecture which serves as the foundation for future processor platforms.
  • Enable a full line of server and workstation products
    – Leading edge x86 (32-bit) performance and compatibility
    – Native 64-bit support
    – Establish x86-64 Instruction Set Architecture
    – Extensive Multiprocessor support
    – RAS features
  • Provide top-to-bottom desktop and mobile processors

Agenda

 

x86-64 Technology

Why 64-Bit Computing?

  • Required for large memory programs
    – Large databases
    – Scientific and Engineering Problems
    • Designing CPUs
  • But,
    – Limited Demand for Applications which require 64 bits
    • Most applications can remain 32-bit x86 instructions, if
    the processor continues to deliver leading edge x86
    performance
  • And,
    – Software is a huge investment (tool chains, applications, certifications)
    – Instruction set is first and foremost a vehicle for compatibility
    • Binary compatibility
    • Interpreter/JIT support is increasingly important

x86-64 Instruction Set Architecture

  • x86-64 mode built on x86
    – Similar to the previous extension from 16-bit to 32-bit
    – Vast majority of opcodes and features unchanged
    – Integer/Address register files and datapaths are native 64-bit
    – 48-Bit Virtual Address Space, 40-Bit Physical Address Space
  • Enhancements
    – Add 8 new integer registers
    – Add PC relative addressing
    – Add full support for SSE/SSEII based Floating Point Application Binary Interface (ABI)
    • including 16 registers
    – Additional Registers and Data Size added through reclaim of one byte increment/decrement opcodes (0x40-0x4F) for use as a single optional prefix
  • Public specification
    www.x86-64.org

 

 

X86-64 Code Generation and Quality

  • Compiler and Tool Chain is a straight forward port
  • Instruction set is designed to offer all the advantages of CISC and RISC
    – Code density of CISC
    – Register usage and ABI models of RISC
    – Enables easy application of standard compiler optimizations
  • SpecInt2000 Code Generation (compared to 32 bit x86)
    – Code size grows <10%
    • Due mostly to instruction prefixes
    – Static Instruction Count SHRINKS by 10%
    – Dynamic Instruction Count SHRINKS by at least 5%
    – Dynamic Load/Store Count SHRINKS by 20%
    – All without any specific code optimizations

x86-64 Summary

  • Processor is fully x86 capable
    – Full native performance with 32-bit applications and OS
    – Full compatibility (BIOS, OS, Drivers)
  • Flexible deployment
    – Best-in-class 32-bit, x86 performance
    – Excellent 64-bit, x86-64 instruction execution when needed
  • Server, Workstation, Desktop, and Mobile share same architecture
    – OS, Drivers and Applications can be the same
    – CPU vendors focus not split, ISV focus not split
    – Support, optimization, etc. all designed to be the same

 

The "Hammer" Architecture

 

 

 

 

 

 

 

 

 

 

 

 

DDR Memory Controller

  • Integrated Memory Controller Details
    – Memory controller details
    • 8 or 16-byte interface
    • 16-Byte interface supports
    – Direct connection to 8 registered DIMMs
    – Chipkill ECC
    • Unbuffered or Registered DIMMs
    • PC1600, PC2100, and PC2700 DDR memory
  • Integrated Memory Controller Benefits
    – Significantly reduces DRAM latency
    – Memory latency improves
    • as CPU and HyperTransport link speed improves
    – Bandwidth and capacity grows with number of CPUs
    – Snoop probe throughput scales with CPU frequency

Reliability and Availability

  • L1 Data Cache ECC Protected
  • L2 Cache AND Cache Tags ECC Protected
  • DRAM ECC Protected
    – With Chipkill ECC support
  • On Chip and off Chip ECC Protected Arrays include background hardware scrubbers
  • Remaining arrays parity protected
    – L1 Instruction Cache, TLBs, Tags
    – Generally read only data which can be recovered
  • Machine Check Architecture
    – Report failures and predictive failure results
    – Mechanism for hardware/software error containment and recovery

HyperTransport Technology

  • Next-generation computing performance goes beyond the microprocessor
  • Screaming I/O for chip-to-chip communication
    – High bandwidth
    – Reduced pin count
    – Point-to-point links
    – Split transaction and full duplex
  • Open standard
    – Industry enabler for building high bandwidth I/O subsystems
    – I/O subsystems: PCI-X, G-bit Ethernet, Infiniband, etc.
  • Strong Industry Acceptance
    – 100+ companies evaluating specification & several licensing technologies through AMD (2000)
    – First HyperTransport technology-based south bridge announced by nVIDIA (June 2001)
  • Enables scalable 2-8 processor SMP systems
    – Glueless MP

 

 

 

 

 

 

 

 

 

 

 

 

 

 

"Hammer" Architecture Summary

  • 8th Generation microprocessor core
    – Improved IPC and operating frequency
    – Support for large workloads
  • Cache subsystem
    – Enhanced TLB structures
    – Improved branch prediction
  • Integrated DDR memory controller
    – Reduced DRAM latency
  • HyperTransport technology
    – Screaming I/O for chip-to-chip communication
    – Enables glueless MP

 

"Hammer" System Architecture

 

 

 

 

 

 

MP System Architecture

  • Software view of memory is SMP
    – Physical address space is flat and fully coherent
    – Latency difference between local and remote memory in an 8P system is comparable to the difference between a DRAM page hit and DRAM page conflict
    – DRAM location can be contiguous or interleaved
  • Multiprocessor support designed in from the beginning
    – Lower overall chip count
    – All MP system functions use CPU technology and frequency
  • 8P System parameters
    – 64 DIMMs (up to 128GB) directly connected
    – 4 HyperTransport links available for IO (25GB/s)

The Rewards of Good Plumbing

  • Bandwidth
    – 4P system designed to achieve 8GB/s aggregate memory copy bandwidth
    • With data spread throughout system
    – Leading edge bus based systems limited to about 2.1GB/s aggregate bandwidth (3.2GB/s theoretical peak)
  • Latency
    – Average unloaded latency in 4P system (page miss) is designed to be 140ns
    – Average unloaded latency in 8P system (page miss) is designed to be 160ns
    – Latency under load planned to increase much more slowly than bus based systems due to available bandwidth
    – Latency shrinks quickly with increasing CPU clock speed and HyperTransport link speed

"Hammer" Summary

  • 8th generation CPU core
    – Delivering high-performance through an optimum balance of IPC and operating frequency
  • x86-64 technology
    – Compelling 64-bit migration strategy without any significant sacrifice of existing code base
    – Full speed support for x86 code base
    – Unified architecture from notebook through server
  • DDR memory controller
    – Significantly reduces DRAM latency
  • HyperTransport technology
    – High-bandwidth I/O
    – Glueless MP
  • Foundation for future portfolio of processors
    – Top-to-bottom desktop and mobile processors
    – High-performance 1-, 2-, 4-, and 8-way servers and workstations

© 2001 Advanced Micro Devices, Inc.

 

(Sources: AMD)


Related news:

      "; while ($row = mysql_fetch_array($sql_result)) { $news_id = $row["news_id"]; $title = $row["title"]; $url = $row["url"]; echo "
    • $title
    • "; } echo "
"; //Close connection mysql_close($link); ?>

Related links: