Chapter 1   Introduction And Overview    1

 1.1 The Importance Of Architecture    1 1.2 Learning The Essentials    1 1.3 Organization Of The Text    2 1.4 What We Will Omit    3 1.5 Terminology: Architecture And Design    3 1.6 Summary    3

Chapter 2   Fundamentals Of Digital Logic    7

 2.1 Introduction    7 2.2 Electrical Terminology: Voltage And Current    7 2.3 The Transistor    8 2.4 Logic Gates    9 2.5 Symbols Used For Gates    10 2.6 Construction Of Gates From Transistors    11 2.7 Example Interconnection Of Gates    12 2.8 Multiple Gates Per Integrated Circuit    14 2.9 The Need For More Than Combinatorial Circuits    15 2.1 Circuits That Maintain State    15 2.11 Transition Diagrams    16 2.12 Binary Counters    17 2.13 Clocks And Sequences    18 2.14 The Important Concept Of Feedback    20 2.15 Starting A Sequence    22 2.16 Iteration In Software Vs. Replication In Hardware    22 2.17 Gate And Chip Minimization    23 2.18 Using Spare Gates    24 2.19 Power Distribution And Heat Dissipation    24 2.2 Timing    25 2.21 Physical Size And Process Technologies    26 2.22 Circuit Boards And Layers    27 2.23 Levels Of Abstraction    27 2.24 Summary    28

Chapter 3   Data And Program Representation    29

 3.1 Introduction    29 3.2 Digital Logic And Abstraction    29 3.3 Bits And Bytes    30 3.4 Byte Size And Possible Values    30 3.5 Binary Arithmetic    31 3.6 Hexadecimal Notation    32 3.7 Notation For Hexadecimal And Binary Constants    33 3.8 Character Sets    34 3.9 Unicode    35 3.10 Unsigned Integers, Overflow, And Underflow    35 3.11 Numbering Bits And Bytes    36 3.12 Signed Integers    37 3.13 An Example Of Two's Complement Numbers    38 3.14 Sign Extension    39 3.15 Floating Point    40 3.16 Special Values    42 3.17 Range Of IEEE Floating Point Values    42 3.18 Data Aggregates    42 3.19 Program Representation    43 3.20 Summary    43 Exercises    44

Chapter 4   The Variety Of Processors And Computational Engines    47

 4.1 Introduction    47 4.2 Von Neumann Architecture    47 4.3 Definition Of A Processor    48 4.4 The Range Of Processors    48 4.5 Hierarchical Structure And Computational Engines    49 4.6 Structure Of A Conventional Processor    51 4.7 Definition Of An Arithmetic Logic Unit (ALU)    52 4.8 Processor Categories And Roles    52 4.9 Processor Technologies    54 4.10 Stored Programs    54 4.11 The Fetch-Execute Cycle    55 4.12 Clock Rate And Instruction Rate    56 4.13 Control: Getting Started And Stopping    57 4.14 Starting The Fetch-Execute Cycle    57 4.15 Summary    58 Exercises    58

Chapter 5   Processor Types And Instruction Sets    61

 5.1 Introduction    61 5.2 Mathematical Power, Convenience, And Cost    61 5.3 Instruction Set And Representation    62 5.4 Opcodes, Operands, And Results    63 5.5 Typical Instruction Format    63 5.6 Variable-Length Vs. Fixed-Length Instructions    63 5.7 General-Purpose Registers    64 5.8 Floating Point Registers And Register Identification    65 5.9 Programming With Registers    65 5.10 Register Banks    66 5.11 Complex And Reduced Instruction Sets    67 5.12 RISC Design And The Execution Pipeline    68 5.13 Pipelines And Instruction Stalls    69 5.14 Other Causes Of Pipeline Stalls    71 5.15 Consequences For Programmers    71 5.16 Programming, Stalls, And No-Op Instructions    72 5.17 Forwarding    72 5.18 Types Of Operations    73 5.19 Program Counter, Fetch-Execute, And Branching    73 5.20 Subroutine Calls, Arguments, And Register Windows    75 5.21 An Example Instruction Set    76 5.22 Minimalistic Instruction Set    78 5.23 The Principle Of Orthogonality    79 5.24 Condition Codes And Conditional Branching    80 5.25 Summary    80 Exercises    81

Chapter 6   Operand Addressing And Instruction Representation    83

 6.1 Introduction    83 6.2 Zero, One, Two, Or Three Address Designs    83 6.3 Zero Operands Per Instruction    84 6.4 One Operand Per Instruction    85 6.5 Two Operands Per Instruction    85 6.6 Three Operands Per Instruction    86 6.7 Operand Sources And Immediate Values    86 6.8 The Von Neumann Bottleneck    87 6.9 Explicit And Implicit Operand Encoding    88 6.9.1 Implicit Operand Encoding    88 6.9.2 Explicit Operand Encoding    88 6.10 Operands That Combine Multiple Values    89 6.11 Tradeoffs In The Choice Of Operands    90 6.12 Values In Memory And Indirect Reference    91 6.13 Operand Addressing Modes    92 6.14 Summary    93 Exercises    93

Chapter 7   CPUs: Microcode, Protection, And Processor Modes    95

 7.1 Introduction    95 7.2 A Central Processor    95 7.3 CPU Complexity    96 7.4 Modes Of Execution    97 7.5 Backward Compatibility    97 7.6 Changing Modes    98 7.7 Privilege And Protection    99 7.8 Multiple Levels Of Protection    99 7.9 Microcoded Instructions    100 7.10 Microcode Variations    102 7.11 The Advantage Of Microcode    102 7.12 Making Microcode Visible To Programmers    103 7.13 Vertical Microcode    103 7.14 Horizontal Microcode    104 7.15 Example Horizontal Microcode    105 7.16 A Horizontal Microcode Example    107 7.17 Operations That Require Multiple Cycles    108 7.18 Horizontal Microcode And Parallel Execution    109 7.19 Look-Ahead And High Performance Execution    110 7.20 Parallelism And Execution Order    111 7.21 Out-Of-Order Instruction Execution    111 7.22 Conditional Branches And Branch Prediction    112 7.23 Consequences For Programmers    113 7.24 Summary    113 Exercises    114

Chapter 8   Assembly Languages And Programming Paradigm    115

 8.1 Introduction    115 8.2 Characteristics Of A High-level Programming Language    115 8.3 Characteristics Of A Low-Level Programming Language    116 8.4 Assembly Language    117 8.5 Assembly Language Syntax And Opcodes    118 8.5.1 Statement Format    118 8.5.2 Opcode Names    118 8.5.3 Commenting Conventions    119 8.6 Operand Order    120 8.7 Register Names    121 8.8 Operand Types    122 8.9 Assembly Language Programming Paradigm And Idioms    122 8.10 Assembly Code For Conditional Execution    123 8.11 Assembly Code For A Conditional Alternative    124 8.12 Assembly Code For Definite Iteration    124 8.13 Assembly Code For Indefinite Iteration    125 8.14 Assembly Code For Procedure Invocation    125 8.15 Assembly Code For Parameterized Procedure Invocation    126 8.16 Consequence For Programmers    127 8.17 Assembly Code For Function Invocation    128 8.18 Interaction Between Assembly And High-Level Languages    128 8.19 Assembly Code For Variables And Storage    129 8.20 Two-Pass Assembler    130 8.21 Assembly Language Macros    131 8.22 Summary    134

Chapter 9   Memory And Storage    137

 9.1 Introduction    137 9.2 Definition    137 9.3 The Key Aspects Of Memory    138 9.4 Characteristics Of Memory Technologies    138 9.4.1 Memory Volatility    138 9.4.2 Memory Access Paradigm    139 9.4.3 Permanence Of Values    139 9.4.4 Primary And Secondary Memory    139 9.5 The Important Concept Of A Memory Hierarchy    140 9.6 Instruction And Data Store    140 9.7 The Fetch-Store Paradigm    141 9.8 Summary    141

Chapter 10   Physical Memory And Physical Addressing    143

 10.1 Introduction    143 10.2 Characteristics Of Computer Memory    143 10.3 Static And Dynamic RAM Technologies    144 10.4 Measures Of Memory Technology    145 10.5 Density    146 10.6 Separation Of Read And Write Performance    146 10.7 Latency And Memory Controllers    146 10.8 Synchronized Memory Technologies    147 10.9 Multiple Data Rate Memory Technologies    148 10.10 Examples Of Memory Technologies    148 10.11 Memory Organization    148 10.12 Memory Access And Memory Bus    149 10.13 Memory Transfer Size    150 10.14 Physical Addresses And Words    150 10.15 Physical Memory Operations    150 10.16 Word Size And Other Data Types    151 10.17 An Extreme Case: Byte Addressing    151 10.18 Byte Addressing With Word Transfers    152 10.19 Using Powers Of Two    153 10.20 Byte Alignment And Programming    154 10.21 Memory Size And Address Space    154 10.22 Programming With Word Addressing    155 10.23 Measures Of Memory Size    155 10.24 Pointers And Data Structures    156 10.25 A Memory Dump    156 10.26 Indirection And Indirect Operands    158 10.27 Memory Banks And Interleaving    158 10.28 Content Addressable Memory    159 10.29 Ternary CAM    160 10.30 Summary    160 Exercises    161

Chapter 11   Virtual Memory Technologies And Virtual Addressing    163

 11.1 Introduction    163 11.2 Definition    163 11.3 A Virtual Example: Byte Addressing    164 11.4 Virtual Memory Terminology    164 11.5 An Interface To Multiple Physical Memory Systems    164 11.6 Address Translation Or Address Mapping    166 11.7 Avoiding Arithmetic Calculation    167 11.8 Discontiguous Address Spaces    168 11.9 Other Memory Organizations    169 11.10 Motivation For Virtual Memory    169 11.11 Multiple Virtual Spaces And Multiprogramming    170 11.12 Multiple Levels Of Virtualization    171 11.13 Creating Virtual Spaces Dynamically    171 11.14 Base-Bound Registers    172 11.15 Changing The Virtual Space    172 11.16 Virtual Memory, Base-Bound, And Protection    173 11.17 Segmentation    174 11.18 Demand Paging    175 11.19 Hardware And Software For Demand Paging    175 11.20 Page Replacement    176 11.21 Paging Terminology And Data Structures    176 11.22 Address Translation In A Paging System    177 11.23 Using Powers Of Two    178 11.24 Presence, Use, And Modified Bits    179 11.25 Page Table Storage    180 11.26 Paging Efficiency And A Translation Lookaside Buffer    181 11.27 Consequences For Programmers    182 11.28 Summary    183 Exercises    183

Chapter 12   Caches And Caching    185

 12.1 Introduction    185 12.2 Definition    185 12.3 Characteristics Of A Cache    186 12.4 The Importance Of Caching    187 12.5 Examples Of Caching    188 12.6 Cache Terminology    188 12.7 Best And Worst Case Cache Performance    189 12.8 Cache Performance On A Typical Sequence    190 12.9 Cache Replacement Policy    190 12.10 LRU Replacement    191 12.11 Multi-level Cache Hierarchy    191 12.12 Preloading Caches    192 12.13 Caches Used With Memory    192 12.14 TLB As A Cache    193 12.15 Demand Paging As A Form Of Caching    193 12.16 Physical Memory Cache    194 12.17 Write Through And Write Back    194 12.18 Cache Coherence    195 12.19 L1, L2, and L3 Caches    196 12.20 Sizes Of L1, L2, And L3 Caches    197 12.21 Instruction And Data Caches    197 12.22 Virtual Memory Caching And A Cache Flush    198 12.23 Implementation Of Memory Caching    199 12.24 Direct Mapping Memory Cache    200 12.25 Using Powers Of Two For Efficiency    201 12.26 Set Associative Memory Cache    202 12.27 Consequences For Programmers    203 12.28 Summary    204 Exercises    204

Chapter 13   Input\|/\|Output Concepts And Terminology    207

 13.1 Introduction    207 13.2 Input And Output Devices    207 13.3 Control Of An External Device    208 13.4 Data Transfer    209 13.5 Serial And Parallel Data Transfers    209 13.6 Self-Clocking Data    210 13.7 Full-Duplex And Half-Duplex Interaction    210 13.8 Interface Latency And Throughput    211 13.9 The Fundamental Idea Of Multiplexing    211 13.1 Multiple Devices Per External Interface    212 13.11 A Processor's View Of I\^/\^O    213 13.12 Summary    213

Chapter 14   Buses And Bus Architectures    215

 14.1 Introduction    215 14.2 Definition Of A Bus    215 14.3 Processors, I\^/\^O Devices, And Buses    216 14.4 Proprietary And Standardized Buses    216 14.5 Shared Buses And An Access Protocol    217 14.6 Multiple Buses    217 14.7 A Parallel, Passive Mechanism    217 14.8 Physical Connections    217 14.9 Bus Interface    218 14.10 Address, Control, And Data Lines    219 14.11 The Fetch-Store Paradigm    220 14.12 Fetch-Store Over A Bus    220 14.13 The Width Of A Bus    220 14.14 Multiplexing    221 14.15 Bus Width And Size Of Data Items    222 14.16 Bus Address Space    223 14.17 Potential Errors    224 14.18 Address Configuration And Sockets    225 14.19 Many Buses Or One Bus    226 14.20 Using Fetch-Store With Devices    226 14.21 An Example Of Device Control Using Fetch-Store    226 14.22 Operation Of An Interface    227 14.23 Asymmetric Assignments    228 14.24 Unified Memory And Device Addressing    228 14.25 Holes In The Address Space    230 14.26 Address Map    230 14.27 Program Interface To A Bus    231 14.28 Bridging Between Two Buses    232 14.29 Main And Auxiliary Buses    232 14.30 Consequences For Programmers    234 14.31 Switching Fabrics    234 14.32 Summary    235 Exercises    236

Chapter 15   Programmed And Interrupt-Driven I/O    237

 15.1 Introduction    237 15.2 I\^/\^O Paradigms    237 15.3 Programmed I\^/\^O    238 15.4 Synchronization    238 15.5 Polling    239 15.6 Code For Polling    239 15.7 Control And Status Registers    241 15.8 Processor Use And Polling    241 15.9 First, Second, And Third Generation Computers    242 15.10 Interrupt-Driven I\^/\^O    242 15.11 A Hardware Interrupt Mechanism    243 15.12 Interrupts And The Fetch-Execute Cycle    243 15.13 Handling An Interrupt    244 15.14 Interrupt Vectors    245 15.15 Initialization And Enabling And Disabling Interrupts    246 15.16 Preventing Interrupt Code From Being Interrupted    246 15.17 Multiple Levels Of Interrupts    246 15.18 Assignment Of Interrupt Vectors And Priorities    247 15.19 Dynamic Bus Connections And Pluggable Devices    248 15.20 The Advantage Of Interrupts    249 15.21 Smart Devices And Improved I\^/\^O Performance    249 15.22 Direct Memory Access (DMA)    250 15.23 Buffer Chaining    251 15.24 Scatter Read And Gather Write Operations    252 15.25 Operation Chaining    252 15.26 Summary    253 Exercises    253

Chapter 16   A Programmer's View Of Devices, I\^/\^O, And Buffering    255

 16.1 Introduction    255 16.2 Definition Of A Device Driver    256 16.3 Device Independence, Encapsulation, And Hiding    256 16.4 Conceptual Parts Of A Device Driver    257 16.5 Two Types Of Devices    258 16.6 Example Flow Through A Device Driver    258 16.7 Queued Output Operations    259 16.8 Forcing An Interrupt    261 16.9 Queued Input Operations    261 16.10 Devices That Support Bi-Directional Transfer    262 16.11 Asynchronous Vs. Synchronous Programming Paradigm    263 16.12 Asynchrony, Smart Devices, And Mutual Exclusion    264 16.13 I\^/\^O As Viewed By An Application    264 16.14 Run-Time I\^/\^O Libraries    265 16.15 The Library\^/\^Operating System Dichotomy    266 16.16 I\^/\^O Operations The OS Supports    267 16.17 The Cost Of I\^/\^O Operations    268 16.18 Reducing The System Call Overhead    268 16.19 The Important Concept Of Buffering    269 16.20 Implementation of Buffering    270 16.21 Flushing A Buffer    271 16.22 Buffering On Input    272 16.23 Effectiveness Of Buffering    272 16.24 Buffering In An Operating System    273 16.25 Relation To Caching    274 16.26 An Example: The Unix Standard I\^/\^O Library    274 16.27 Summary    274 Exercises    275

Chapter 17   Parallelism    279

 17.1 Introduction    279 17.2 Parallel And Pipelined Architectures    279 17.3 Characterizations Of Parallelism    280 17.4 Microscopic Vs. Macroscopic    280 17.5 Examples Of Microscopic Parallelism    281 17.6 Examples Of Macroscopic Parallelism    281 17.7 Symmetric Vs. Asymmetric    282 17.8 Fine-grain Vs. Coarse-grain Parallelism    282 17.9 Explicit Vs. Implicit Parallelism    283 17.10 Parallel Architectures    283 17.11 Types Of Parallel Architectures (Flynn Classification)    283 17.12 Single Instruction Single Data (SISD)    284 17.13 Single Instruction Multiple Data (SIMD)    284 17.14 Multiple Instructions Multiple Data (MIMD)    286 17.15 Communication, Coordination, And Contention    288 17.16 Performance Of Multiprocessors    290 17.17 Consequences For Programmers    292 17.17.1 Locks And Mutual Exclusion    292 17.17.2 Programming Explicit And Implicit Parallel Computers    293 17.17.3 Programming Symmetric And Asymmetric Multiprocessors    294 17.18 Redundant Parallel Architectures    294 17.19 Distributed And Cluster Computers    295 17.20 Summary    296 Exercises    297

Chapter 18   Pipelining    299

 18.1 Introduction    299 18.2 The Concept Of Pipelining    299 18.3 Software Pipelining    301 18.4 Software Pipeline Performance And Overhead    302 18.5 Hardware Pipelining    303 18.6 How Hardware Pipelining Increases Performance    303 18.7 When Pipelining Can Be Used    306 18.8 The Conceptual Division Of Processing    307 18.9 Pipeline Architectures    307 18.10 Pipeline Setup, Stall, And Flush Times    308 18.11 Definition Of Superpipeline Architecture    308 18.12 Summary    309 Exercises    309

Chapter 19   Assessing Performance    311

 19.1 Introduction    311 19.2 Measuring Power And Performance    311 19.3 Measures Of Computational Power    312 19.4 Application Specific Instruction Counts    313 19.5 Instruction Mix    314 19.6 Standardized Benchmarks    315 19.7 I\^/\^O And Memory Bottlenecks    316 19.8 Boundary Between Hardware And Software    316 19.9 Choosing Items To Optimize    317 19.10 Amdahl's Law And Parallel Systems    317 19.11 Summary    318 Exercises    318

Chapter 20   Architecture Examples And Hierarchy    319

 20.1 Introduction    319 20.2 Architectural Levels    319 20.3 System-Level Architecture: A Personal Computer    320 20.4 Bus Interconnection And Bridging    321 20.5 Controller Chips And Physical Architecture    322 20.6 Virtual Buses    323 20.7 Connection Speeds    325 20.8 Bridging Functionality And Virtual Buses    325 20.9 Board-Level Architecture    325 20.1 Chip-Level Architecture    327 20.11 Structure Of Functional Units On A Chip    328 20.12 Summary    329 20.13 Hierarchy Beyond Computer Architectures    329

Appendix 1   Lab Exercises For A Computer Architecture Course    331

 A1.1 Introduction    331 A1.2 Digital Hardware For A Lab    332 A1.3 Solderless Breadboard    332 A1.4 Using A Solderless Breadboard    333 A1.5 Testing    334 A1.6 Power And Ground Connections    334 A1.7 Lab Exercises    335