1.1
| Network Systems And The Internet   1
|
1.2
| Applications Vs. Infrastructure   1
|
1.3
| Network System Engineering   2
|
1.4
| Packet Processing   2
|
1.5
| Achieving High Speed   3
|
1.6
| Network Speed   3
|
1.7
| Hardware, Software, And Hybrids   4
|
1.8
| Scope And Organization Of The Text   5
|
1.9
| Summary   5
|
| For Further Study   5
|
2.1
| Introduction   7
|
2.2
| Networks And Packets   7
|
2.3
| Connection-Oriented And Connectionless Paradigms   8
|
2.4
| Digital Circuits   8
|
2.5
| LAN And WAN Classifications   9
|
2.6
| The Internet And Heterogeneity   9
|
2.7
| Example Network Systems   9
|
2.8
| Broadcast Domains   10
|
2.9
| The Two Key Systems Used In The Internet   11
|
2.10
| Other Systems Used In The Internet   12
|
2.11
| Monitoring And Control Systems   12
|
2.12
| Summary   13
|
| For Further Study   13
|
3.1
| Introduction   15
|
3.2
| Protocols And Layering   15
|
3.3
| Layers 1 And 2 (Physical And Network Interface)   17
|
| 3.3.1
| Ethernet   17
|
| 3.3.2
| Ethernet Frame Format   17
|
| 3.3.3
| Ethernet Addresses   18
|
| 3.3.4
| Ethernet Type Field   19
|
3.4
| Layer 3 (Internet)   19
|
| 3.4.1
| The Internet Protocol   19
|
| 3.4.2
| IP Datagram Format   19
|
| 3.4.3
| IP Addresses   20
|
3.5
| Layer 4 (Transport)   20
|
| 3.5.1
| UDP And TCP   20
|
| 3.5.2
| UDP Datagram Format   21
|
| 3.5.3
| TCP Segment Format   21
|
3.6
| Protocol Port Numbers And Demultiplexing   22
|
3.7
| Encapsulation And Transmission   23
|
3.8
| Address Resolution Protocol   23
|
3.9
| Summary   24
|
| For Further Study   24
|
4.1
| Introduction   29
|
4.2
| A Conventional Computer System   29
|
4.3
| Network Interface Cards   30
|
4.4
| Definition Of A Bus   31
|
4.5
| The Bus Address Space   32
|
4.6
| The Fetch-Store Paradigm   33
|
4.7
| Network Interface Card Functionality   34
|
4.8
| NIC Optimizations For High Speed   34
|
4.9
| Onboard Address Recognition   35
|
| 4.9.1
| Unicast And Broadcast Recognition And Filtering   35
|
| 4.9.2
| Multicast Recognition And Filtering   35
|
4.10
| Onboard Packet Buffering   36
|
4.11
| Direct Memory Access   37
|
4.12
| Operation And Data Chaining   38
|
4.13
| Data Flow Diagram   39
|
4.14
| Promiscuous Mode   39
|
4.15
| Summary   40
|
| For Further Study   40
|
5.1
| Introduction   43
|
5.2
| State Information and Resource Exhaustion   43
|
5.3
| Packet Buffer Allocation   44
|
5.4
| Packet Buffer Size And Copying   45
|
5.5
| Protocol Layering And Copying   45
|
5.6
| Heterogeneity And Network Byte Order   46
|
5.7
| Bridge Algorithm   47
|
5.8
| Table Lookup And Hashing   49
|
5.9
| IP Datagram Fragmentation And Reassembly   50
|
| 5.9.1
| Interpretation Of The Flags Field   51
|
| 5.9.2
| Interpretation Of The Fragment Offset Field   51
|
| 5.9.3
| IP Fragmentation Algorithm   52
|
| 5.9.4
| Fragmenting A Fragment   53
|
| 5.9.5
| IP Reassembly   53
|
| 5.9.6
| Grouping Fragments Together   54
|
| 5.9.7
| Fragment Position   54
|
| 5.9.8
| IP Reassembly Algorithm   55
|
5.10
| IP Datagram Forwarding   56
|
5.11
| IP Forwarding Algorithm   57
|
5.12
| High-Speed IP Forwarding   57
|
5.13
| TCP Connection Recognition Algorithm   59
|
5.14
| TCP Splicing Algorithm   60
|
5.15
| Summary   62
|
| For Further Study   63
|
| Exercises   63
|
6.1
| Introduction   67
|
6.2
| Packet Processing   68
|
6.3
| Address Lookup And Packet Forwarding   68
|
6.4
| Error Detection And Correction   69
|
6.5
| Fragmentation, Segmentation, And Reassembly   70
|
6.6
| Frame And Protocol Demultiplexing   70
|
6.7
| Packet Classification   71
|
| 6.7.1
| Static And Dynamic Classification   71
|
| 6.7.2
| Demultiplexing Vs. Classification   71
|
| 6.7.3
| Optimized Packet Processing   72
|
| 6.7.4
| Classification Languages   72
|
6.8
| Queueing And Packet Discard   73
|
| 6.8.1
| Basic Queueing   73
|
| 6.8.2
| Priority Mechanisms   74
|
| 6.8.3
| Packet Discard   75
|
6.9
| Scheduling And Timing   75
|
6.10
| Security: Authentication And Privacy   76
|
6.11
| Traffic Measurement And Policing   76
|
6.12
| Traffic Shaping   77
|
6.13
| Timer Management   79
|
6.14
| Summary   80
|
| For Further Study   80
|
| Exercises   80
|
7.1
| Introduction   83
|
7.2
| Implementation Of Packet Processing In An Application   83
|
7.3
| Fast Packet Processing In Software   84
|
7.4
| Embedded Systems   84
|
7.5
| Operating Systems Implementation   85
|
7.6
| Software Interrupts And Priorities   85
|
7.7
| Multiple Priorities And Kernel Threads   87
|
7.8
| Thread Synchronization   88
|
7.9
| Software For Layered Protocols   88
|
| 7.9.1
| One Thread Per Layer   89
|
| 7.9.2
| One Thread Per Protocol   90
|
| 7.9.3
| Multiple Threads Per Protocol   90
|
| 7.9.4
| Separate Timer Management Threads   90
|
| 7.9.5
| One Thread Per Packet   91
|
7.10
| Asynchronous Vs. Synchronous Programming   92
|
7.11
| Summary   92
|
| For Further Study   93
|
| Exercises   93
|
8.1
| Introduction   95
|
8.2
| Network Systems Architecture   95
|
8.3
| The Traditional Software Router   96
|
8.4
| Aggregate Data Rate   97
|
8.5
| Aggregate Packet Rate   97
|
8.6
| Packet Rate And Software Router Feasibility   99
|
8.7
| Overcoming The Single CPU Bottleneck   101
|
8.8
| Fine-Grain Parallelism   102
|
8.9
| Symmetric Coarse-Grain Parallelism   102
|
8.10
| Asymmetric Coarse-Grain Parallelism   103
|
8.11
| Special Purpose Coprocessors   103
|
8.12
| ASIC Coprocessor Implementation   104
|
8.13
| NICs With Onboard Processing   105
|
8.14
| Smart NICs With Onboard Stacks   106
|
8.15
| Cell Switching   106
|
8.16
| Data Pipelines   107
|
8.17
| Summary   109
|
| For Further Study   109
|
| Exercises   109
|
9.1
| Introduction   113
|
9.2
| Inherent Limits Of Demultiplexing   113
|
9.3
| Packet Classification   114
|
9.4
| Software Implementation Of Classification   115
|
9.5
| Optimizing Software-Based Classification   116
|
9.6
| Software Classification On Special-Purpose Hardware   117
|
9.7
| Hardware Implementation Of Classification   117
|
9.8
| Optimized Classification Of Multiple Rule Sets   118
|
9.9
| Classification Of Variable-Size Headers   120
|
9.10
| Hybrid Hardware/Software Classification   121
|
9.11
| Dynamic Vs. Static Classification   122
|
9.12
| Fine-Grain Flow Creation   123
|
9.13
| Flow Forwarding In A Connection-Oriented Network   124
|
9.14
| Connectionless Network Classification And Forwarding   124
|
9.15
| Second Generation Network Systems   125
|
9.16
| Embedded Processors In Second Generation Systems   126
|
9.17
| Classification And Forwarding Chips   127
|
9.18
| Summary   128
|
| For Further Study   128
|
| Exercises   128
|
10.1
| Introduction   131
|
10.2
| Bandwidth Of An Internal Fast Path   131
|
10.3
| The Switching Fabric Concept   132
|
10.4
| Synchronous And Asynchronous Fabrics   133
|
10.5
| A Taxonomy Of Switching Fabric Architectures   134
|
10.6
| Dedicated Internal Paths And Port Contention   134
|
10.7
| Crossbar Architecture   135
|
10.8
| Basic Queueing   137
|
10.9
| Time Division Solutions: Sharing Data Paths   139
|
10.10
| Shared Bus Architecture   139
|
10.11
| Other Shared Medium Architectures   140
|
10.12
| Shared Memory Architecture   141
|
10.13
| Multistage Fabrics   142
|
10.14
| Banyan Architecture   143
|
10.15
| Scaling A Banyan Switch   144
|
10.16
| Commercial Technologies   146
|
10.17
| Summary   146
|
| For Further Study   147
|
| Exercises   147
|
11.1
| Introduction   151
|
11.2
| The CPU In A Second Generation Architecture   151
|
11.3
| Third Generation Network Systems   152
|
11.4
| The Motivation For Embedded Processors   153
|
11.5
| RISC vs. CISC   153
|
11.6
| The Need For Custom Silicon   154
|
11.7
| Definition Of A Network Processor   155
|
11.8
| A Fundamental Idea: Flexibility Through Programmability   156
|
11.9
| Instruction Set   157
|
11.10
| Scalability With Parallelism And Pipelining   157
|
11.11
| The Costs And Benefits Of Network Processors   158
|
11.12
| The Status And Future Of Network Processors   159
|
11.13
| Summary   160
|
| For Further Study   160
|
| Exercises   160
|
12.1
| Introduction   163
|
12.2
| Network Processor Functionality   163
|
12.3
| Packet Processing Functions   164
|
12.4
| Ingress And Egress Processing   165
|
| 12.4.1
| Ingress Processing   165
|
| 12.4.2
| Egress Processing   166
|
12.5
| Parallel And Distributed Architecture   168
|
12.6
| The Architectural Roles Of Network Processors   169
|
12.7
| Consequences For Each Architectural Role   169
|
12.8
| Macroscopic Data Pipelining And Heterogeneity   171
|
12.9
| Network Processor Design And Software Emulation   171
|
12.10
| Summary   172
|
| For Further Study   172
|
| Exercises   173
|
13.1
| Introduction   175
|
13.2
| Architectural Variety   175
|
13.3
| Primary Architectural Characteristics   176
|
| 13.3.1
| Processor Hierarchy   176
|
| 13.3.2
| Memory Hierarchy   177
|
| 13.3.3
| Internal Transfer Mechanisms   179
|
| 13.3.4
| External Interface And Communication Mechanisms   180
|
| 13.3.5
| Special-Purpose Hardware   181
|
| 13.3.6
| Polling And Notification Mechanisms   181
|
| 13.3.7
| Concurrent Execution Support   182
|
| 13.3.8
| Hardware Support For Programming   183
|
| 13.3.9
| Hardware And Software Dispatch Mechanisms   183
|
| 13.3.10
| Implicit And Explicit Parallelism   184
|
13.4
| Architecture, Packet Flow, And Clock Rates   184
|
13.5
| Software Architecture   187
|
13.6
| Assigning Functionality To The Processor Hierarchy   187
|
13.7
| Summary   189
|
| For Further Study   190
|
| Exercises   190
|
14.1
| Introduction   193
|
14.2
| The Processing Hierarchy And Scaling   193
|
14.3
| Scaling By Making Processors Faster   194
|
14.4
| Scaling By Increasing The Number of Processors   194
|
14.5
| Scaling By Increasing Processor Types   195
|
14.6
| Scaling A Memory Hierarchy   196
|
14.7
| Scaling By Increasing Memory Size   198
|
14.8
| Scaling By Increasing Memory Bandwidth   198
|
14.9
| Scaling By Increasing Types Of Memory   199
|
14.10
| Scaling By Adding Memory Caches   200
|
14.11
| Scaling With Content Addressable Memory   201
|
14.12
| Using CAM for Packet Classification   203
|
14.13
| Other Limitations On Scale   205
|
14.14
| Software Scalability   206
|
14.15
| Bottlenecks And Scale   207
|
14.16
| Summary   207
|
| Exercises   208
|
15.1
| Introduction   211
|
15.2
| An Explosion Of Commercial Products   211
|
15.3
| A Selection of Products   212
|
15.4
| Multi-Chip Pipeline (Agere)   212
|
15.5
| Augmented RISC Processor (Alchemy)   216
|
15.6
| Embedded Processor Plus Coprocessors (AMCC)   217
|
15.7
| Pipeline Of Homogeneous Processors (Cisco)   219
|
15.8
| Configurable Instruction Set Processors (Cognigine)   220
|
15.9
| Pipeline Of Heterogeneous Processors (EZchip)   221
|
15.10
| Extensive And Diverse Processors (IBM)   223
|
15.11
| Flexible RISC Plus Coprocessors (Motorola)   225
|
15.12
| Summary   229
|
| For Further Study   229
|
| Exercises   229
|
16.1
| Introduction   231
|
16.2
| Optimized Classification   231
|
16.3
| Imperative And Declarative Paradigms   232
|
16.4
| A Programming Language For Classification   233
|
16.5
| Automated Translation   233
|
16.6
| Language Features That Aid Programming   234
|
16.7
| The Relationship Between Language And Hardware   234
|
16.8
| Efficiency And Execution Speed   235
|
16.9
| Commercial Classification Languages   236
|
16.10
| Intel's Network Classification Language (NCL)   236
|
16.11
| An Example Of NCL Code   237
|
16.12
| NCL Intrinsic Functions   240
|
16.13
| Predicates   241
|
16.14
| Conditional Rule Execution   241
|
16.15
| Incremental Protocol Definition   242
|
16.16
| NCL Set Facility   243
|
16.17
| Other NCL Features   244
|
16.18
| Agere's Functional Programming Language (FPL)   245
|
16.19
| Two Pass Processing   245
|
16.20
| Designating The First And Second Pass   247
|
16.21
| Using Patterns For Conditionals   247
|
16.22
| Symbolic Constants   249
|
16.23
| Example FPL Code For Second Pass Processing   249
|
16.24
| Sequential Pattern Matching Paradigm   250
|
16.25
| Tree Functions And The BITS Default   252
|
16.26
| Return Values   252
|
16.27
| Passing Information To The Routing Engine   252
|
16.28
| Access To Built-in And External Functions   253
|
16.29
| Other FPL Features   253
|
| 16.29.1
| FPL Constant Syntax   253
|
| 16.29.2
| FPL Variables   254
|
| 16.29.3
| FPL Support For Dynamic Classification   255
|
16.30
| Summary   255
|
| Exercises   255
|
17.1
| Introduction   259
|
17.2
| Low Cost Vs. Performance   259
|
17.3
| Programmability Vs. Processing Speed   260
|
17.4
| Performance: Packet Rate, Data Rate, And Bursts   260
|
17.5
| Speed Vs. Functionality   261
|
17.6
| Per-Interface Rate Vs. Aggregate Data Rate   261
|
17.7
| Network Processor Speed Vs. Bandwidth   261
|
17.8
| Coprocessor Design: Lookaside Vs. Flow-Through   262
|
17.9
| Pipelining: Uniform Vs. Synchronized   262
|
17.10
| Explicit Parallelism Vs. Cost And Programmability   262
|
17.11
| Parallelism: Scale Vs. Packet Ordering   263
|
17.12
| Parallelism: Speed Vs. Stateful Classification   263
|
17.13
| Memory: Speed Vs. Programmability   263
|
17.14
| I/O Performance Vs. Pin Count   264
|
17.15
| Programming Languages: A Three-Way Tradeoff   264
|
17.16
| Multithreading: Speed Vs. Programmability   264
|
17.17
| Traffic Management Vs. Blind Forwarding At Low Cost   265
|
17.18
| Generality Vs. Specific Architectural Role   265
|
17.19
| Memory Type: Special-Purpose Vs. General-Purpose   265
|
17.20
| Backward Compatibility Vs. Architectural Advances   266
|
17.21
| Parallelism Vs. Pipelining   266
|
17.22
| Summary   267
|
| Exercises   267
|
18.1
| Introduction   271
|
18.2
| Intel Terminology   271
|
18.3
| IXA: Internet Exchange Architecture   272
|
18.4
| IXP: Internet Exchange Processor   272
|
18.5
| Basic IXP1200 Features   273
|
18.6
| External Connections   273
|
| 18.6.1
| Serial Line Interface   275
|
| 18.6.2
| PCI Bus   275
|
| 18.6.3
| IX Bus   275
|
| 18.6.4
| SDRAM Bus   275
|
| 18.6.5
| SRAM Bus   276
|
18.7
| Internal Components   276
|
18.8
| IXP1200 Processor Hierarchy   277
|
| 18.8.1
| General-Purpose Processor   278
|
| 18.8.2
| Embedded RISC Processor (StrongARM)   278
|
| 18.8.3
| I/O Processors (Microengines)   278
|
| 18.8.4
| Coprocessors And Other Functional Units   279
|
| 18.8.5
| Physical Interface Processors   279
|
18.9
| IXP1200 Memory Hierarchy   279
|
18.10
| Word And Longword Addressing   281
|
18.11
| An Example Of Underlying Complexity   281
|
18.12
| Other Hardware Facilities   283
|
18.13
| Summary   283
|
| For Further Study   283
|
| Exercises   284
|
19.1
| Introduction   287
|
19.2
| Purpose Of An Embedded Processor   287
|
19.3
| StrongARM Architecture   289
|
19.4
| RISC Instruction Set And Registers   289
|
19.5
| StrongARM Memory Architecture   290
|
19.6
| StrongARM Memory Map   291
|
19.7
| Virtual Address Space And Memory Management   292
|
19.8
| Shared Memory And Address Translation   292
|
19.9
| Internal Peripheral Units   293
|
| 19.9.1
| Serial Connection Through UART Hardware   293
|
| 19.9.2
| Countdown Timers   293
|
| 19.9.3
| General-Purpose I/O Pins   294
|
| 19.9.4
| Real-Time Clock   294
|
19.10
| Other I/O   294
|
19.11
| User And Kernel Mode Operation   294
|
19.12
| Coprocessor 15   295
|
19.13
| Summary   295
|
| For Further Study   295
|
| Exercises   296
|
20.1
| Introduction   299
|
20.2
| The Purpose Of Microengines   299
|
20.3
| Microengine Architecture   300
|
20.4
| The Concept Of Microsequencing   300
|
20.5
| Microengine Instruction Set   301
|
20.6
| Separate Memory Address Spaces   303
|
20.7
| Instruction Pipeline   303
|
20.8
| The Concept Of Instruction Stalls   305
|
20.9
| Conditional Branching And Pipeline Abort   306
|
20.10
| Memory Access Delay   306
|
20.11
| Hardware Threads And Context Switching   307
|
20.12
| Microengine Instruction Store   309
|
20.13
| Microengine Hardware Registers   310
|
20.14
| General Purpose Registers   310
|
| 20.14.1
| Context-Relative Vs. Absolute Registers   310
|
| 20.14.2
| Register Banks   310
|
20.15
| Transfer Registers   312
|
20.16
| Local Control And Status Registers (CSRs)   313
|
20.17
| Inter-Processor Communication   313
|
20.18
| FBI Unit   314
|
20.19
| Transmit And Receive FIFOs   315
|
20.20
| FBI Architecture And Push/Pull Engines   315
|
20.21
| Scratchpad Memory   316
|
20.22
| Hash Unit   317
|
20.23
| Configuration, Control, and Status Registers   319
|
20.24
| Summary   319
|
| For Further Study   319
|
| Exercises   320
|
21.1
| Introduction   323
|
21.2
| Reference Systems   323
|
21.3
| The Intel Reference System   324
|
| 21.3.1
| Intel's Hardware Testbed   324
|
| 21.3.2
| Intel's Software Development Kit   325
|
21.4
| Host Operating System Choices   326
|
21.5
| Operating System Used On The StrongARM   326
|
21.6
| External File Access And Storage   327
|
21.7
| PCI Ethernet Emulation   328
|
21.8
| Bootstrapping The Reference Hardware   328
|
21.9
| Running Software   329
|
21.10
| System Reboot   330
|
21.11
| Alternative Cross-Development Software   330
|
21.12
| Summary   330
|
| For Further Study   331
|
22.1
| Introduction   333
|
22.2
| The ACE Abstraction   333
|
22.3
| ACE Definitions And Terminology   334
|
22.4
| Four Conceptual Parts Of An ACE   334
|
22.5
| Output Targets And Late Binding   335
|
22.6
| An Example Of ACE Interconnection   335
|
22.7
| Division Of An ACE Into Core And Microblocks   336
|
22.8
| Microblock Groups   337
|
22.9
| Replicated Microblock Groups   338
|
22.10
| Microblock Structure   338
|
22.11
| The Dispatch Loop   339
|
22.12
| Dispatch Loop Calling Conventions   340
|
22.13
| Packet Queues   341
|
22.14
| Exceptions   342
|
22.15
| Crosscalls   343
|
22.16
| Application Programs Outside The ACE Model   344
|
22.17
| Summary   344
|
23.1
| Introduction   347
|
23.2
| StrongARM Responsibilities   347
|
23.3
| Principle Run-Time Components   348
|
23.4
| Core Components Of ACEs   348
|
23.5
| Object Management Services (OMS)   349
|
| 23.5.1
| Resolver   349
|
| 23.5.2
| Name Server   350
|
23.6
| Resource Manager   350
|
23.7
| Operating System Specific Library (OSSL)   350
|
23.8
| Action Services Library   351
|
23.9
| Automated Microengine Assignment   351
|
23.10
| ACE Program Structure   352
|
23.11
| ACE Main Program And Event Loop   352
|
23.12
| ACE Event Loop And Blocking   353
|
23.13
| Asynchronous Programming Paradigm And Callbacks   354
|
23.14
| Asynchronous Execution And Mutual Exclusion   356
|
23.15
| Memory Allocation   357
|
23.16
| Loading And Starting An ACE (ixstart)   358
|
23.17
| ACE Control Block Allocation And Initialization   359
|
23.18
| Crosscalls   360
|
23.19
| Crosscall Declaration Using IDL   361
|
23.20
| Communications Access Point (CAP)   362
|
23.21
| Timer Management   362
|
23.22
| NCL Classification, Actions, And Default   364
|
23.23
| Summary   365
|
| For Further Study   366
|
| Exercises   366
|
24.1
| Introduction   369
|
24.2
| Intel's Microengine Assembler   369
|
24.3
| Microengine Assembly Language Syntax   370
|
24.4
| Example Operand Syntax   371
|
24.5
| Symbolic Register Names And Allocation   374
|
24.6
| Register Types And Syntax   375
|
24.7
| Local Register Scope, Nesting, And Shadowing   376
|
24.8
| Register Assignments And Conflicts   377
|
24.9
| The Macro Preprocessor   378
|
24.10
| Macro Definition   378
|
24.11
| Repeated Generation Of A Code Segment   380
|
24.12
| Structured Programming Directives   381
|
24.13
| Instructions That Can Cause A Context Switch   383
|
24.14
| Indirect Reference   384
|
24.15
| External Transfers   385
|
24.16
| Library Macros And Transfer Register Allocation   386
|
24.17
| Summary   387
|
| For Further Study   388
|
| Exercises   388
|
25.1
| Introduction   391
|
25.2
| Specialized Memory Operations   391
|
25.3
| Buffer Pool Manipulation   392
|
25.4
| Processor Coordination Via Bit Testing   392
|
25.5
| Atomic Memory Increment   393
|
25.6
| Processor Coordination Via Memory Locking   394
|
25.7
| Control And Status Registers   395
|
25.8
| Intel Dispatch Loop Macros   397
|
25.9
| Packet Queues And Selection   398
|
25.10
| Accessing Fields In A Packet Header   399
|
25.11
| Initialization Required For Dispatch Loop Macros   401
|
25.12
| Packet I/O And The Concept Of Mpackets   402
|
25.13
| Packet Input Without Interrupts   403
|
25.14
| Ingress Packet Transfer   404
|
25.15
| Packet Egress   404
|
25.16
| Other I/O Details   406
|
25.17
| Summary   407
|
| For Further Study   407
|
| Exercises   407
|