## Table of Contents For Essentials Of Computer Architecture

### Chapter 1   Introduction And Overview    3

 1.1 The Importance Of Architecture    3 1.2 Learning The Essentials    3 1.3 The Cycle Of New Hardware And New Software    4 1.4 What We Will Cover    5 1.5 What We Will Omit    6 1.6 What We Will Emphasize    6 1.7 Architecture, Design, and Implementation    7 1.8 Hardware Designs And Unexpected Constraints    8 1.9 Summary    8

### Chapter 2   Program Interpretation And Transformation    11

 2.1 Introduction    11 2.2 Specification Of Computation    11 2.3 Automated Program Interpretation    12 2.4 What Level Of Abstraction Should Be Used For Programs?    13 2.5 Transforming A Source Program To Machine Language    13 2.6 Using Memory Addresses Instead Of Variable Names    15 2.7 Global And Local Variables    16 2.8 Segments And Their Location In Memory    17 2.9 Relocation And Memory Addresses    18 2.10 The ELF Standard for Storing Object Programs    20 2.11 Translation Of An Example C Program    21 Exercises    23

### Chapter 3   Data And Program Representation    27

 3.1 Introduction    27 3.2 Definitions Of Bit And Byte    27 3.3 Possible Values In A Byte    28 3.4 Binary Weighted Positional Representation    29 3.5 Bit Ordering    31 3.6 Hexadecimal Notation Used By Humans    31 3.7 Notation For Hexadecimal And Binary Constants    32 3.8 Character Sets    33 3.9 Unicode    34 3.10 Unsigned Integers And Endianness    35 3.11 Signed Binary Integers    36 3.12 Quirks Of Signed Representations    37 3.13 An Example Of Two's Complement Numbers    37 3.14 Sign Extension    38 3.15 Casting Integers    39 3.16 Floating Point    40 3.17 Range Of IEEE Floating Point Values    42 3.18 Biased Exponent Values    43 3.19 An Example Floating Point Number    43 3.20 Special Values And NaN    44 3.21 Binary Coded Decimal Representation    45 3.22 Signed, Fractional, And Packed BCD Representations    46 3.23 Data Aggregates    46 3.24 Instructions And Their Representation    47 3.25 Summary    48 Exercises    49

### Chapter 4   A High-Level Overview Of Processors    53

 4.1 Introduction    53 4.2 The Two Basic Architectural Approaches    53 4.3 The Harvard And Von Neumann Architectures    54 4.4 Definition Of A Processor    55 4.5 The Range Of Processors    56 4.6 Hierarchical Structure And Computational Engines    57 4.7 Structure Of A Conventional Processor    58 4.8 Processor Categories And Roles    59 4.9 Processor Technologies    61 4.10 Stored Programs    61 4.11 The Fetch-Execute Cycle    62 4.12 Instructions In Memory    63 4.13 Variable-length and Fixed-length Instructions    63 4.14 Clock Rate And Instruction Rate    65 4.15 Control: Getting Started And Stopping    65 4.16 Starting The Fetch-Execute Cycle    66 4.17 Summary    67 Exercises    67

### Chapter 5   Instruction Sets And Operands    71

 5.1 Introduction    71 5.2 Mathematics, Convenience, And Cost    71 5.3 Instruction Sets    72 5.4 Instruction Set Architecture    73 5.5 Opcodes, Operands, And Results    74 5.6 Typical Instruction Format    74 5.7 General-Purpose Registers    74 5.8 Floating Point Registers And Register Identification    75 5.9 Programming With Registers    75 5.10 Register Banks    76 5.11 Terminology: Complex And Reduced Instruction Sets    77 5.12 RISC Design And The Execution Pipeline    78 5.13 Pipelines And Instruction Stalls    80 5.14 Other Causes Of Pipeline Stalls    81 5.15 Consequences For Programmers    82 5.16 Programming, Stalls, And No-Op Instructions    82 5.17 Forwarding    83 5.18 Types Of Operations    83 5.19 Program Counter, Fetch-Execute, And Branching    84 5.20 Condition Codes And Conditional Branching    85 5.21 Subroutine And Function Calls    86 5.22 Argument Passing And Return Values    87 5.23 An Example Instruction Set    88 5.24 The Principle Of Minimalism    90 5.25 The Principles Of Elegance And Orthogonality    91 5.26 Summary    91 Exercises    92

### Chapter 6   Operand Addressing And Operand Types    97

 6.1 Introduction    97 6.2 Zero-, One-, Two-, And Three-Address Designs    97 6.3 Zero Operands Per Instruction    98 6.4 One Operand Per Instruction    99 6.5 Two Operands Per Instruction    99 6.6 Three Operands Per Instruction    100 6.7 Operand Sources And Immediate Values    100 6.8 The Von Neumann Bottleneck    101 6.9 Implicit And Explicit Operand Encoding    101 6.9.1 Implicit Operand Encoding    101 6.9.2 Explicit Operand Encoding    102 6.10 Operands That Combine Multiple Values    103 6.11 Tradeoffs In The Choice Of Operands    103 6.12 Direct And Indirect Operands In Memory    105 6.13 Illustration Of Operand Addressing Modes    105 6.14 Summary    106 Exercises    107

### Chapter 7   Assembly Languages And Programming Paradigm    111

 7.1 Introduction    111 7.2 The Reason To Learn Assembly Language    111 7.3 Characteristics Of High-Level And Low-Level Languages    112 7.4 Assembly Languages    113 7.5 Assembly Language Syntax And Opcodes    114 7.5.1 Statement Format    114 7.5.2 Opcode Names    114 7.5.3 Commenting Conventions    114 7.6 Operand Order    115 7.7 Register Names    116 7.8 Operand Syntax    117 7.9 Assembly Code For If-Then    118 7.10 Assembly Code For If-Then-Else    118 7.11 Assembly Code For Definite Iteration (For Loop)    119 7.12 Assembly Code For Indefinite Iteration (While Loop)    120 7.13 Assembly Code For A Function Call    120 7.14 Variable Declarations In Assembly Code    121 7.15 Example Assembly Language Code    122 7.15.1 The Fibonacci Example In C    122 7.15.2 The Fibonacci Example In x86 Assembly Language    124 7.15.3 The Fibonacci Example In ARM Assembly Language    125 7.16 How An Assembler Works: Two-Pass Translation    127 7.17 Assembly Language Macros    129 7.18 Summary    130 Exercises    130

### Chapter 8   Main Memory And Memory Addressing    135

 8.1 Introduction    135 8.2 Characteristics Of Computer Memory    135 8.3 Static And Dynamic RAM Technologies    136 8.4 Memory Performance And Higher Data Rate Technologies    137 8.5 Memory Addresses For An Array Of Bytes    139 8.6 The Fetch-Store Paradigm, Pointers, And Dereferencing    139 8.7 Memory Dumps    141 8.8 Aligned Memory Access And Aggregates In Memory    143 8.9 Ordering Items To Optimize Space With Aligned Access    144 8.10 Memory Organization And Aligned Memory Access    145 8.11 Memory Hardware, Words, And Word Operations    146 8.12 Byte Addressing, Word Addressing, And Alignment    147 8.13 Calculating A Word Address    148 8.14 Calculating Word Addresses Using Powers of Two    148 8.15 Multiple Memories With Separate Controllers    149 8.16 Memory Banks    150 8.17 Interleaving    151 8.18 Memory Sizes, Powers Of Two, And Prefixes    152 8.19 Content Addressable Memory    153 8.20 Ternary CAM    154 8.21 Summary    154 Exercises    155

### Chapter 9   Virtual Memory Technologies And Virtual Addressing    159

 9.1 Introduction    159 9.2 Definition Of Virtual Memory    159 9.3 Memory Management Unit And A Virtual Address Space    160 9.4 Address Translation Using Powers Of Two    161 9.5 Virtual Address Spaces For Concurrent Applications    163 9.6 Mechanisms Used To Create Virtual Address Spaces    164 9.7 Base-Bound Registers    164 9.8 Virtual Memory Isolation And Protection    165 9.9 Loading Program Pieces And Demand Paging    165 9.10 Hardware And Software For Demand Paging    166 9.11 Page Replacement    167 9.12 Paging Terminology And Data Structures    167 9.13 Address Translation With A Demand Paging System    168 9.14 Using Powers Of Two For Address Translation    169 9.15 Presence, Use, And Modified Bits    170 9.16 Page Table Storage    171 9.17 Efficient Paging With A Translation Lookaside Buffer    172 9.18 Consequences For Programmers    172 9.19 Single-Level Page Tables And Page Table Size    174 9.20 The Advantage Of Multi-Level Page Tables    174 9.21 Multi-Level Page Tables For 64-Bit Architectures    176 9.22 Sparseness And Address Space Layout Randomization    177 9.23 Page Table Walks And The Importance Of A TLB    177 9.24 Summary    177 Exercises    178

### Chapter 10   Caches And Caching    183

 10.1 Introduction    183 10.2 How Data Propagates Through A Storage Hierarchy    183 10.3 Definition of Caching    184 10.4 Characteristics Of A Cache    184 10.5 Cache Terminology    185 10.6 Best-Case And Worst-Case Cache Performance    186 10.7 Cache Performance On A Typical Sequence    187 10.8 Cache Replacement Policy    187 10.9 LRU Replacement    188 10.10 Multilevel Cache Hierarchy    188 10.11 Preloading Caches    189 10.12 Caches Used With Memory    190 10.13 Main Memory Cache    190 10.14 Write Through And Write Back    191 10.15 Cache Coherence    192 10.16 L1, L2, and L3 Caches    193 10.17 Sizes Of L1, L2, And L3 Caches    194 10.18 Instruction And Data Caches    194 10.19 Modified Harvard Architecture    195 10.20 Implementation Of Memory Caching    196 10.21 Direct Mapped Memory Cache    196 10.22 Using Powers Of Two For Efficiency    198 10.23 Hardware Implementation Of A Direct Mapped Cache    199 10.24 Set Associative Memory Cache    201 10.25 Consequences For Programmers    202 10.26 The Relationship Between Virtual Memory And Caching    202 10.27 Virtual Memory Caching And Cache Flush    203 10.28 A Note About Cache Performance    204 10.29 Summary    205 Exercises    205

### Chapter 11   Storage: File Systems, Blocks, And SSDs    209

 11.1 Introduction    209 11.2 Persistent Storage Mechanisms And Files    209 11.3 The History Of Persistent Storage    210 11.4 Solid-State Drives (SSDs)    211 11.5 The Block-Oriented Interface    211 11.6 DMA Hardware Used For Block Transfer    212 11.7 Storing Files In 512-Byte Blocks    213 11.8 Blocks Of Cells, Logical Blocks, And Erase-Before-Write    214 11.9 Flash Lifetime    215 11.10 The Internal Structure Of An SSD    215 11.11 Logical Block Storage And Update    216 11.12 Repeated Wear, Location Mapping, And Wear Leveling    217 11.13 Overprovisioning And Bad Block Management    218 11.14 Garbage Collection    218 11.15 Assessment Of SSD Technology    219 11.16 Summary    220 Exercises    220

### Chapter 12   A Programmer's View Of Devices, I\^/\^O, And Buffering    223

 12.1 Introduction    223 12.2 Devices And Device Hardware    223 12.3 Device-Independent I\^/\^O And Encapsulation    224 12.4 Conceptual Parts Of A Device Driver    225 12.5 Two Main Categories Of Devices    226 12.6 Example Flow Through A Device Driver    227 12.7 An Input Queue In A Device Driver    228 12.8 An Output Queue In A Device Driver    228 12.9 Isolating I\^/\^O Devices From Applications    229 12.10 The Motivation For A Standard I\^/\O Library    230 12.11 Reducing System Call Overhead    231 12.12 Standard I\^/\^O Functions    231 12.13 Buffered Input    232 12.14 Buffered Output    232 12.15 Flushing A Buffer    233 12.16 Using Buffered I\^/\^O With Devices    233 12.17 The Relationship Between Buffering And Caching    234 12.18 Summary    234 Exercises    235

### Chapter 13   Buses And Bus Architectures    239

 13.1 Introduction    239 13.2 Definition Of A Bus    239 13.3 Processors, I\^/\^O Devices, And Buses    240 13.3.1 Proprietary And Standardized Buses    240 13.3.2 Shared Buses And An Access Protocol    241 13.3.3 Multiple Buses    241 13.3.4 A Parallel Vs. Serial Bus    241 13.4 Physical Bus Connections And Sockets    242 13.5 Control, Address, And Data Lines In A Bus    242 13.6 The Fetch-Store Paradigm    243 13.7 Fetch-Store Operations On A Parallel Bus    244 13.8 Data Transfer Rate, Bus Width, And Serial Buses    244 13.9 Large Data Transfers And Direct Memory Access    246 13.10 Bus Transfer Size And The Size Of Data Items    246 13.11 Bus Address Space    247 13.12 Invalid Addresses And Bus Errors    248 13.13 Memory Addresses And Sockets    249 13.14 The Question Of Multiple Buses    249 13.15 Using Fetch-Store With Devices    250 13.16 Operation Of An Interface    251 13.17 Asymmetric Specifications And Bus Errors    252 13.18 An Example Bus Address Space And Address Map    252 13.19 Holes In A Bus Address Space And Utilization    253 13.20 The Program Interface To A Bus    254 13.21 Bridging Between Two Buses    255 13.22 An Example Bridge Mapping    256 13.23 Consequences For Device Driver Programmers    257 13.24 Switching Fabrics As An Alternative To Buses    257 13.25 Summary    259 Exercises    259

### Chapter 14   Programming Devices And Interrupt-Driven I/O    263

 14.1 Introduction    263 14.2 The Two I\^/\^O Paradigms    263 14.3 Programmed I\^/\^O    264 14.4 Synchronization    265 14.5 Synchronization Using Polling    265 14.6 Code For Polling    266 14.7 Control And Status Registers    268 14.8 Using A Struct To Define CSRs    269 14.9 Processor Use And Polling    270 14.10 Interrupt-Driven I\^/\^O    270 14.11 An Interrupt Mechanism And Fetch-Execute    271 14.12 Handling An Interrupt    272 14.13 Interrupt Vectors    273 14.14 Interrupt Initialization And Disabled Interrupts    274 14.15 Interrupting An Interrupt Handler    274 14.16 Configuration Of Interrupts    275 14.17 Dynamic Bus Connections And Pluggable Devices    276 14.18 Interrupts, Performance, And Smart Devices    276 14.19 Smart Devices, DMA, And Offloading    278 14.20 Extending DMA With Buffer Chaining    278 14.21 Scatter Read And Gather Write Operations    279 14.22 Operation Chaining    280 14.23 Summary    280 Exercises    281

### Chapter 15   Data Paths And Instruction Execution    285

 15.1 Introduction    285 15.2 Data Paths    285 15.3 An Example Instruction Set    286 15.4 Instructions In Memory    288 15.5 Moving To The Next Instruction    290 15.6 Fetching An Instruction    292 15.7 Decoding An Instruction    292 15.8 Connections To A Register Unit    294 15.9 Control And Coordination    294 15.10 Arithmetic Operations And Multiplexing    295 15.11 Operations Involving Data In Memory    296 15.12 Example Execution Sequences    297 15.13 Summary    298 Exercises    299

### Chapter 16   CPUs: Microcode, Protection, And Processor Modes    303

 16.1 Introduction    303 16.2 A Central Processor    303 16.3 CPU Complexity    304 16.4 Modes Of Execution    305 16.5 Backward Compatibility    305 16.6 Changing Modes    306 16.7 Privilege And Protection    306 16.8 Multiple Levels Of Protection    306 16.9 Microcoded Instructions    307 16.10 Microcode Variations    309 16.11 The Advantage Of Microcode    309 16.12 FPGAs And Changes To An Instruction Set    310 16.13 Vertical Microcode    311 16.14 Horizontal Microcode    311 16.15 Example Horizontal Microcode    313 16.16 A Horizontal Microcode Example    314 16.17 Operations That Require Multiple Cycles    315 16.18 Horizontal Microcode And Parallel Execution    316 16.19 Look-Ahead And High-Performance Execution    317 16.20 Parallelism And Execution Order    318 16.21 Out-Of-Order Instruction Execution    318 16.22 Conditional Branches And Branch Prediction    319 16.23 Consequences For Programmers    320 16.24 Summary    320 Exercises    321

### Chapter 17   Parallelism    325

 17.1 Introduction    325 17.2 Parallel And Pipelined Architectures    325 17.3 Characterizations Of Parallelism    326 17.4 Microscopic Vs. Macroscopic    326 17.5 Examples Of Microscopic Parallelism    327 17.6 Examples Of Macroscopic Parallelism    327 17.7 Symmetric Vs. Asymmetric    328 17.8 Fine-Grain Vs. Coarse-Grain Parallelism    328 17.9 Explicit Vs. Implicit Parallelism    329 17.10 Types Of Parallel Architectures (Flynn Classification)    329 17.11 Single Instruction Single Data (SISD)    330 17.12 Single Instruction Multiple Data (SIMD)    330 17.13 Multiple Instructions Multiple Data (MIMD)    332 17.14 Communication, Coordination, And Contention    334 17.15 Performance Of Multiprocessors    335 17.16 Consequences For Programmers    337 17.16.1 Locks And Mutual Exclusion    337 17.16.2 Programming Explicit And Implicit Parallel Computers    339 17.16.3 Programming Symmetric Vs. Asymmetric Multiprocessors    340 17.17 Redundant Parallel Architectures    340 17.18 Distributed And Cluster Computers    341 17.19 A Modern Supercomputer    342 17.20 Summary    342 Exercises    343

### Chapter 18   Data Pipelining    347

 18.1 Introduction    347 18.2 The Concept Of Pipelining    347 18.3 Software Pipelining    349 18.4 Hardware Pipelining    349 18.5 How Hardware Pipelining Increases Performance    350 18.6 When Pipelining Can Be Used    352 18.7 The Conceptual Division Of Processing    353 18.8 Pipeline Architectures    354 18.9 Pipeline Setup, Stall, And Flush Times    355 18.10 Definition Of Superpipeline Architecture    355 18.11 Summary    356 Exercises    356

### Chapter 19   Assessing Performance    361

 19.1 Introduction    361 19.2 Measuring Computational Power And Performance    361 19.3 Measures Of Computational Power    362 19.4 Application-Specific Instruction Counts    363 19.5 Instruction Mix    364 19.6 Standardized Benchmarks    365 19.7 I\^/\^O And Memory Bottlenecks    366 19.8 Moving The Boundary Between Hardware And Software    366 19.9 Choosing Items To Optimize, Amdahl's Law    367 19.10 Amdahl's Law And Parallel Systems    368 19.11 Summary    368 Exercises    369

### Chapter 20   Multicore Processors    373

 20.1 Introduction    373 20.2 The Move To Multicore Processor Chips    373 20.3 The Multicore Concept: Parallelism And Shared Memory    374 20.4 Multicore Processor Vs. Multiprocessor    375 20.5 Asymmetry    375 20.6 Direct Communication Among Cores    376 20.7 Tight Coupling And Extremely Low Latency    376 20.8 Shared Access To All I\^/\^O Devices    377 20.9 The Ability To Associate Interrupts With Specific Cores    377 20.1 The Ability To Start And Stop Cores    378 20.11 Shared Memory And Multicore Caching    379 20.12 Cache Inconsistency    379 20.13 Preventing Inconsistency: Cache Coherence    380 20.14 Programming Multicore: Threads And Scheduling    381 20.15 Thread Scheduling And Affinity    382 20.16 Simultaneous Multi-Threading (SMT)    383 20.17 Inter-core Communication Via Messages And Software Interrupts    384 20.18 Mutual Exclusion Among Cores    385 20.19 Locks, And Busy Waiting    386 20.2 Using A Memory Location As A Lock (Test-And-Set)    386 20.21 An Example Of Test-And-Set    387 20.22 Atomic Update And Cache Coherence    388 20.23 Summary    388

### Chapter 21   Power And Energy    391

 21.1 Introduction    391 21.2 Definition Of Power    391 21.3 Definition Of Energy    392 21.4 Power Consumption By A Digital Circuit    393 21.5 Switching Power Consumed By A CMOS Digital Circuit    394 21.6 Cooling, Power Density, And The Power Wall    395 21.7 Energy Use    395 21.8 Power Management    396 21.8.1 Voltage And Delay    396 21.8.2 Decreasing Clock Frequency    397 21.8.3 Slower Clock Frequency And Multicore Processors    398 21.9 Software Control Of Energy Use    399 21.10 Choosing When To Sleep And When To Awaken    400 21.11 Sleep Modes And Network Devices    402 21.12 Summary    402 Exercises    403

### Chapter 22   Building Blocks: Transistors, Gates, And Clocks    407

 22.1 Introduction    407 22.2 The History Of Digital Technologies    407 22.3 Electrical Terminology: Voltage And Current    408 22.4 The Transistor    408 22.5 Logic Gates    410 22.6 Implementation Of A Nand Logic Gate Using Transistors    412 22.7 Symbols Used For Logic Gates    413 22.8 Example Interconnection Of Gates    413 22.9 A Digital Circuit For Binary Addition    416 22.10 Multiple Gates Per Integrated Circuit    418 22.11 The Need For More Than Combinatorial Circuits    418 22.12 Circuits That Maintain State    419 22.13 Feedback And Propagation Delay    420 22.14 Using Latches To Create A Memory    420 22.15 Flip-Flops And Transition Diagrams    421 22.16 Binary Counters    423 22.17 Clocks, Feedback, And Sequences    424 22.18 The Importance Of Feedback    427 22.19 Starting A Sequence    428 22.20 Iteration In Software Vs. Replication In Hardware    429 22.21 Gate And Chip Minimization    430 22.22 Using Spare Gates    430 22.23 Power Distribution And Heat Dissipation    431 22.24 Timing And Clock Zones    431 22.25 Clockless Logic    433 22.26 Circuit Size And Moore's Law    434 22.27 Circuit Boards And Layers    435 22.28 Levels Of Abstraction    435 22.29 Summary    436 Exercises    437

### Chapter 23   Hardware Modularity    441

 23.1 Introduction    441 23.2 Motivations For Modularity    441 23.3 Software Modularity    442 23.4 Parameterization    442 23.5 Forms Of Hardware Modularity    442 23.6 Modular Chip Construction    443 23.6.1 SoC Design    443 23.6.2 Chiplet Technology    443 23.6.3 Chiplets And Fabrication Flaws    444 23.7 Replication And Parallelism    444 23.8 Basic Block Replication    445 23.9 An Example Modular Design: A Rebooter    445 23.10 High-Level Rebooter Design    445 23.11 A Building Block To Accommodate A Range Of Sizes    446 23.12 Parallel Interconnection    447 23.13 Module Selection    448 23.14 Summary    448 Exercises    449

### Appendix 1   Rules For Boolean Algebra Simplification    451

 A1.1 Introduction    451 A1.2 Notation Used    451 A1.3 Rules Of Boolean Algebra    452

### Appendix 2   A Quick Introduction To x86 Assembly Language    453

 A2.1 Introduction    453 A2.2 The x86 General-Purpose Registers    454 A2.3 Allowable Operands    455 A2.4 Intel And AT&T Forms Of x86 Assembly Language    456 A2.4.1 Characteristics Of Intel Assembly Language    457 A2.4.2 Characteristics Of AT&T Assembly Language    457 A2.5 Arithmetic Instructions    458 A2.6 Logical Operations    459 A2.7 Basic Data Types    460 A2.8 Data Blocks, Arrays, And Strings    462 A2.9 Memory References    463 A2.10 Data Size Inference And Explicit Size Directives    464 A2.11 Computing An Address    465 A2.12 The Stack Operations Push And Pop    466 A2.13 Flow Of Control And Unconditional Branch    467 A2.14 Conditional Branch And Condition Codes    467 A2.15 Subprogram Call And Return    468 A2.16 C Calling Conventions And Argument Passing    469 A2.17 Function Calls And A Return Value    471 A2.18 Extensions To Sixty-Four Bits (x64)    471 A2.19 Summary    472

### Appendix 3   ARM Register Definitions And Calling Sequence    475

 A3.1 Introduction    475 A3.2 Registers On An ARM Processor    475 A3.3 ARM Calling Conventions    476

### Appendix 4   Lab Exercises For A Computer Architecture Course    481

 A4.1 Introduction    481 A4.2 Hardware Required for Digital Logic Experiments    482 A4.3 Solderless Breadboard    482 A4.4 Using A Solderless Breadboard    483 A4.5 Power And Ground Connections    484 A4.6 Building And Testing Circuits    484 A4.7 Lab Exercises    485