Figure 24-30: Port VC Capability Register 1 (Read-Only)

Table 24 - 14: Port VC Capability Register 1 (Read-Only)

Bit(s)	Description
2:0	Extended VC Count. The number of additional VCs supported by the device. - $0 =$ just VC0 is supported. - The maximum value is 7.
6:4	Low Priority Extended VC Count. Indicates the number of VCs (starting with VC0) that comprise the Low-Priority VC (LPVC) group. - 0 . There is no LPVC group and the sequence in which the port's VC but ers transfer is governed by the fixed-priority scheme wherein VC0 has the lowest priority and the highest-numbered VC that is implemented has the highest priority. - Non-zero value ( $n$ ). VCs 0-through- $n$ are members of the LPVC group. The value specified cannot be greater than that specified in the Extended VC Count field of this register. - The VCs above $n$ are members of the high-priority group where $VC n + 1$ has the lowest priority and the highest VC has the highest pri ority. - Control passes to the LPVC group only when the VCs in the upper group have no packets to transfer. The priority scheme used among the VCs that are members of the lower group is governed by the VC Arbitration Capability field in Port VC Capability Register 2 (see “Port VC Capability Register 2” on page 943).

Bit(s)

Description

2:0

Extended VC Count. The number of additional VCs supported by the device. -

0 =

just VC0 is supported. - The maximum value is 7.

6:4

Low Priority Extended VC Count. Indicates the number of VCs (starting with VC0) that comprise the Low-Priority VC (LPVC) group. - 0 . There is no LPVC group and the sequence in which the port's VC but ers transfer is governed by the fixed-priority scheme wherein VC0 has the lowest priority and the highest-numbered VC that is implemented has the highest priority. - Non-zero value (

n

). VCs 0-through-

n

are members of the LPVC group. The value specified cannot be greater than that specified in the Extended VC Count field of this register. - The VCs above

n

are members of the high-priority group where

VC n + 1

has the lowest priority and the highest VC has the highest pri ority. - Control passes to the LPVC group only when the VCs in the upper group have no packets to transfer. The priority scheme used among the VCs that are members of the lower group is governed by the VC Arbitration Capability field in Port VC Capability Register 2 (see “Port VC Capability Register 2” on page 943).

Chapter 24: Express-Specific Configuration Registers

Table 24 - 14: Port VC Capability Register 1 (Read-Only) (Continued)

Bit(s)	Description
9:8	Reference Clock. The reference clock for VCs that support time-based WRR Port Arbitration. This field is valid for RCRB and for Switch Ports an is not valid for Root Ports and Endpoint devices (must be hardwired to 0). - $00 b = 100 ns$ reference clock. - $01 b - 11 b$ are reserved.
11:10	Port Arbitration Table Entry Size. Indicates the size (in bits) of each entry in the device’s Port Arbitration table. This field is valid only for an RCRB \| and for any Switch Port. It is hardwired to 0 for Endpoint devices and Root Ports. - 00b The size of each Port Arbitration table entry is 1 bit. - 01b The size of each Port Arbitration table entry is 2 bits. - 10b The size of each Port Arbitration table entry is 4 bits. - 11b The size of each Port Arbitration table entry is 8 bits.

Bit(s)

Description

9:8

Reference Clock. The reference clock for VCs that support time-based WRR Port Arbitration. This field is valid for RCRB and for Switch Ports an is not valid for Root Ports and Endpoint devices (must be hardwired to 0). -

00 b = 100 ns

reference clock. -

01 b - 11 b

are reserved.

11:10

Port Arbitration Table Entry Size. Indicates the size (in bits) of each entry in the device’s Port Arbitration table. This field is valid only for an RCRB | and for any Switch Port. It is hardwired to 0 for Endpoint devices and Root Ports. - 00b The size of each Port Arbitration table entry is 1 bit. - 01b The size of each Port Arbitration table entry is 2 bits. - 10b The size of each Port Arbitration table entry is 4 bits. - 11b The size of each Port Arbitration table entry is 8 bits.

Port VC Capability Register 2

The register is illustrated in Figure 24-31 on page 943 and each bit field is described in Table 24 - 15 on page 944.

Figure 24-31: Port VC Capability Register 2 (Read-Only)

PCI Express System Architecture

Table 24 - 15: Port VC Capability Register 2 (Read-Only)

Bit(s)	Description
7:0	VC Arbitration Capability. This bit mask indicates the arbitration scheme(s) supported by the device for the LPVC group. It is valid for a devices that report a Low Priority Extended VC Count greater than 0 (see the description in Table 24 - 14 on page 942). Each bit corresponds to an arbitration scheme defined below. When more than one bit is set, it indi cates that the Port can be configured to provide different VC arbitration se vices. - Bit 0: Hardwired, fixed arbitration scheme (e.g., Round Robin) - Bit 1: Weighted Round Robin (WRR) arbitration with 32 phases. - Bit 2: WRR arbitration with 64 phases. - Bit 3: WRR arbitration with 128 phases. - Bits 4-7: Reserved. The desired arbitration scheme is selected via the VC Arbitration Select field in the Port VC Control Register (see Table 24 - 16 on page 94
31:24	VC Arbitration Table Offset. Indicates the location of the VC Arbitration Table with reference to the start of the VC capability register set (specified in increments of dqwords—16 bytes). A value of 0 indicates that the table i not present.

Bit(s)

Description

7:0

VC Arbitration Capability. This bit mask indicates the arbitration scheme(s) supported by the device for the LPVC group. It is valid for a devices that report a Low Priority Extended VC Count greater than 0 (see the description in Table 24 - 14 on page 942). Each bit corresponds to an arbitration scheme defined below. When more than one bit is set, it indi cates that the Port can be configured to provide different VC arbitration se vices. - Bit 0: Hardwired, fixed arbitration scheme (e.g., Round Robin) - Bit 1: Weighted Round Robin (WRR) arbitration with 32 phases. - Bit 2: WRR arbitration with 64 phases. - Bit 3: WRR arbitration with 128 phases. - Bits 4-7: Reserved. The desired arbitration scheme is selected via the VC Arbitration Select field in the Port VC Control Register (see Table 24 - 16 on page 94

31:24

VC Arbitration Table Offset. Indicates the location of the VC Arbitration Table with reference to the start of the VC capability register set (specified in increments of dqwords—16 bytes). A value of 0 indicates that the table i not present.

Port VC Control Register

The register is illustrated in Figure 24-32 on page 944 and each bit field is described in Table 24 - 16 on page 945.

Figure 24-32: Port VC Control Register (Read-Write)

Table 24 - 16: Port VC Control Register (Read-Write)

Bit(s)	Description
0	Load VC Arbitration Table. In order to activate a port’s VC Arbitration Table, the configuration software takes the following steps: 1. When software initially programs the VC Arbitration Table, or when any change is subsequently made to any entry in the table, the VC Arbitra tion Table Status bit in the Port VC Status register is automatically set to one by hardware. 2. Software then sets the Load VC Arbitration Table bit to one, causing the port to read the VC Arbitration Table from the capability register set and apply it. 3. When the port hardware has completed reading and applying the updated table, it automatically clears the VC Arbitration Table Status bit in the Port VC Status register. 1. Software can determine if the updated table has been applied by reading the state of the VC Arbitration Table Status bit in the Port VC Status reg ister. - 0 indicates the updated table has been read and applied. - 1 indicates that the update is not yet complete. This bit is valid for a device when the selected VC Arbitration type (see th next row in this table) uses the VC Arbitration Table. Clearing this bit has no effect. This bit always returns 0 when read.
3:1	VC Arbitration Select. The configuration software selects one of the sup- ported LPVC arbitration schemes by setting it to the BCD value of the bi corresponding to the desired scheme (see the description of bits 7:0 in Table 24 - 15 on page 944). The configuration software must select the arb tration scheme prior to enabling more than one VC in the LPVC group.

Bit(s)

Description

Load VC Arbitration Table. In order to activate a port’s VC Arbitration Table, the configuration software takes the following steps: 1. When software initially programs the VC Arbitration Table, or when any change is subsequently made to any entry in the table, the VC Arbitra tion Table Status bit in the Port VC Status register is automatically set to one by hardware. 2. Software then sets the Load VC Arbitration Table bit to one, causing the port to read the VC Arbitration Table from the capability register set and apply it. 3. When the port hardware has completed reading and applying the updated table, it automatically clears the VC Arbitration Table Status bit in the Port VC Status register. 1. Software can determine if the updated table has been applied by reading the state of the VC Arbitration Table Status bit in the Port VC Status reg ister. - 0 indicates the updated table has been read and applied. - 1 indicates that the update is not yet complete. This bit is valid for a device when the selected VC Arbitration type (see th next row in this table) uses the VC Arbitration Table. Clearing this bit has no effect. This bit always returns 0 when read.

3:1

VC Arbitration Select. The configuration software selects one of the sup- ported LPVC arbitration schemes by setting it to the BCD value of the bi corresponding to the desired scheme (see the description of bits 7:0 in Table 24 - 15 on page 944). The configuration software must select the arb tration scheme prior to enabling more than one VC in the LPVC group.

Port VC Status Register

The register is illustrated in Figure 24-33 on page 946 and each bit field is described in Table 24 - 17 on page 946.

PCI Express System Architecture

Figure 24-33: Port VC Status Register (Read-Only)

Table 24 - 17: Port VC Status Register (Read-Only)

Bit(s)	Description
0	VC Arbitration Table Status. See the description of the Load VC Arbitra- tion Table bit in Table 24 - 16 on page 945

VC Resource Registers

General. At a minimum, each port implements a single VC, VC0, and it may optionally implement up to eight VCs, VC0-through-VC7. For each VC it supports, the port implements the following three registers:

VC Resource Capability register.

VC Resource Control register.

VC Resource Status register.

The following three sections provide a description of each of these registers.

Each VC implements:

A mandatory TC/VC bit map that defines the TCs that should be accepted into this VC.

An optional Port Arbitration Table that defines the order in which the VC accepts packets from the device ingress ports that source packets to it for transmission.

VC Resource Capability Register. The register is illustrated in Figure 24-34 on page 947 and each bit field is described in Table 24 - 18 on page 947.

Chapter 24: Express-Specific Configuration Registers

Figure 24-34: VC Resource Capability Register

Table 24 - 18: VC Resource Capability Register

Bit(s)	Type	Description
7:0	RO	Port Arbitration Capability. This bit mask indicates the types of Port arbitration (one or more) supported by the VC. It is valid for all Switch Ports and an RCRB, but not for PCI Express Endpoint devices or Root Ports. Software selects one of these arbitration schemes by writing to the Port Arbitration Select field in the VC Resource Control register (see "VC Resource Control Register" on page 948) - Bit 0. Hardwired, fixed arbitration scheme (e.g., Round Robin). - Bit 1. Weighted Round Robin (WRR) arbitration with 32 phases - Bit 2. WRR arbitration with 64 phases. - Bit 3. WRR arbitration with 128 phases. - Bit 4. Time-based WRR with 128 phases. - Bit 5. WRR arbitration with 256 phases. - Bits 6-7. Reserved.
14	RO	Advanced Packet Switching. - 1 = This VC only supports transactions optimized for Advanced Packet Switching (AS). This bit is valid for all PCI Express Ports and RCRB. ’ 0 = The VC is capable of supporting all transactions defined by the spec (including AS transport packets).

PCI Express System Architecture

Table 24 - 18: VC Resource Capability Register (Continued)

Bit(s)	Type	Description
15	HwInit	Reject Snoop Transactions. - $0 =$ Transactions with or without the No Snoop bit set are allowed on this VC. - $1 =$ Transactions with No Snoop $= 0$ are rejected as an Unsup- ported Request. This bit is valid for Root Ports and RCRB, but not for Endpoint devices or Switch ports.
22:16	HwInit	Maximum Time Slots. Max time slots (minus one) that the VC supports when configured for time-based WRR port arbitration This field is valid for all Switch ports, Root Ports and an RCRB, but not for Endpoint devices. Only valid when the Port Arbitra- tion Capability field in this register indicates that the VC supports time-based WRR port arbitration.
31:24	RO	Port Arbitration Table Offset. Indicates the location of the Port Arbitration Table associated with this VC with reference to the start of the VC capability register set (specified in increments o dqwords—16 bytes). A value of 0 indicates that the table is not present. This field is valid for all Switch ports and an RCRB, but not for Endpoint devices or Root Ports.

VC Resource Control Register. The register is illustrated in Figure 24- 35 on page 948 and each bit field is described in Table 24 -19 on page 949.

Figure 24-35: VC Resource Control Register (Read-Write)

Table 24 - 19: VC Resource Control Register (Read-Write)

Bit(s)	Description
7:0	TC/VC Map. TC-to-VC mapping bit map. Each bit within this field corre- sponds to a TC that is mapped to this VC. Multiple bits may be set to one Bit $1$ = TC7 is mapped to this VC. $6 1 = TC 6$ is mapped to this VC. $5 1 = TC 5$ is mapped to this VC. $4 1 = TC 4$ is mapped to this VC. 3 $1 =$ TC3 is mapped to this VC. $2 1 = TC 2$ is mapped to this VC. $1 1 = TC 1$ is mapped to this VC. $1 = TC 0$ is mapped to this VC. This bit is read-only. 1 for VC0 and 0 for all other enabled VCs. Before removing one or more TCs from the TC/VC Map of an enabled VC, $\|$ software must ensure that no new or outstanding transactions with those $^{\circ}$ labels are targeted at the given Link. The default value $=$ FFh for VC0 and $= 00 h$ for other VCs
16	Load Port Arbitration Table. In order to activate a VC’s Port Arbitration Table the configuration software takes the following steps: 1. When software initially programs the VC’s Port Arbitration Table, or when any change is subsequently made to any entry in the table, the Port Arbit tion Table Status bit in the VC’s VC Resource Status register (see “VC Resource Status Register" on page 950) is automatically set to on 2. Software then sets the Load Port Arbitration Table bit to one, causing the VC to read the updated Port Arbitration Table from the capability register set and apply it. 3. When the VC hardware has completed reading and applying the updated table, it automatically clears the Port Arbitration Table Status bit in its VC Resource Status register. 4. Software can determine if the updated table has been applied by reading th state of the Port Arbitration Table Status bit in the VC’s VC Status register. – 0 indicates the updated table has been read and applied. – 1 indicates that the update is not yet complete. This bit is valid for a device when the selected Port Arbitration type (the next row in this table) uses the Port Arbitration Table. Clearing this bit has no effect This bit always returns 0 when read. This bit is valid for all Switch Ports and an RCRB, but not for Endpoint devices or Root Ports. This bit always returns when read and the default value of this bit is 0.

Bit(s)

Description

7:0

TC/VC Map. TC-to-VC mapping bit map. Each bit within this field corre- sponds to a TC that is mapped to this VC. Multiple bits may be set to one Bit

1

= TC7 is mapped to this VC.

6 1 = TC 6

is mapped to this VC.

5 1 = TC 5

is mapped to this VC.

4 1 = TC 4

is mapped to this VC. 3

1 =

TC3 is mapped to this VC.

2 1 = TC 2

is mapped to this VC.

1 1 = TC 1

is mapped to this VC.

1 = TC 0

is mapped to this VC. This bit is read-only. 1 for VC0 and 0 for all other enabled VCs. Before removing one or more TCs from the TC/VC Map of an enabled VC,

|

software must ensure that no new or outstanding transactions with those

^{\circ}

labels are targeted at the given Link. The default value

=

FFh for VC0 and

= 00 h

for other VCs

Load Port Arbitration Table. In order to activate a VC’s Port Arbitration Table the configuration software takes the following steps: 1. When software initially programs the VC’s Port Arbitration Table, or when any change is subsequently made to any entry in the table, the Port Arbit tion Table Status bit in the VC’s VC Resource Status register (see “VC Resource Status Register" on page 950) is automatically set to on 2. Software then sets the Load Port Arbitration Table bit to one, causing the VC to read the updated Port Arbitration Table from the capability register set and apply it. 3. When the VC hardware has completed reading and applying the updated table, it automatically clears the Port Arbitration Table Status bit in its VC Resource Status register. 4. Software can determine if the updated table has been applied by reading th state of the Port Arbitration Table Status bit in the VC’s VC Status register. – 0 indicates the updated table has been read and applied. – 1 indicates that the update is not yet complete. This bit is valid for a device when the selected Port Arbitration type (the next row in this table) uses the Port Arbitration Table. Clearing this bit has no effect This bit always returns 0 when read. This bit is valid for all Switch Ports and an RCRB, but not for Endpoint devices or Root Ports. This bit always returns when read and the default value of this bit is 0.

PCI Express System Architecture

Table 24 - 19: VC Resource Control Register (Read-Write) (Continued)

Bit(s)	Description
19:17	Port Arbitration Select. The configuration software selects one of the sup- ported port arbitration schemes by setting it to the BCD value of the bit corr sponding to the desired scheme (see the description of bits 7:0 in Table 24 - 1 on page 947). The configuration software must select the arbitration scheme prior to enabling more than one VC in the LPVC group.
26:24	VC ID. This field assigns a VC ID (between 0 and 7) to the VC (for VC0, it is hardwired to zero). It cannot be modified if the VC has already been enabled
31	VC Enable. - $1 = VC$ enabled. - $0 = VC$ disabled. The state of this bit is qualified by the state of the VC Negotiation Pending bi (in the VC’s VC Resource Status register; see “VC Resource Status Register” or page 950). - $0 =$ negotiation has been completed (Flow Control initialization is com- pleted for the PCI Express Port) and the VC Enable bit indicates the state o the VC. - $1 =$ the negotiation process has not yet completed and the state of the VC Enable bit therefore remains indeterminate This bit is hardwired to 1 for VC0. It is read/write for the other VCs and it default is 0. To enable a VC, its VC Enable bit must be set to one in the ports at both ends of the link. To disable a VC, its VC Enable bit must be cleared to zero in the ports at both ends of the link. Before disabling a VC, software must ensure that no traffic is using the VC. Prior to re-enabling a VC, software must first fully disable the VC in both com ponents on the Link.

VC Resource Status Register. The register is illustrated in Figure 24-36 on page 951 and each bit field is described in Table 24 - 20 on page 951.

Chapter 24: Express-Specific Configuration Registers

Figure 24-36: VC Resource Status Register (Read-Only)

Table 24 - 20: VC Resource Status Register (Read-Only)

Bit(s)	Description
0	Port Arbitration Table Status. See the description of the Load Port Arbitra tion Table bit in Table 24 - 19 on page 949. The default value of this bit is 0.
1	VC Negotiation Pending. Indicates whether the VC negotiation process (initialization or disabling) is in the pending state. When this bit is set by hardware, it indicates that the VC is still in the process of negotiation. It is cleared by hardware after the VC negotiation completes. For VCs other tha \| VC0, software uses this bit to enable or disable the VC. For VC0, this bit indicates the status of the Flow Control initialization process. Before using $i$ VC, software must check whether the VC Negotiation Pending bit is cleared in the components at both ends of the Link.

Bit(s)

Description

Port Arbitration Table Status. See the description of the Load Port Arbitra tion Table bit in Table 24 - 19 on page 949. The default value of this bit is 0.

VC Negotiation Pending. Indicates whether the VC negotiation process (initialization or disabling) is in the pending state. When this bit is set by hardware, it indicates that the VC is still in the process of negotiation. It is cleared by hardware after the VC negotiation completes. For VCs other tha | VC0, software uses this bit to enable or disable the VC. For VC0, this bit indicates the status of the Flow Control initialization process. Before using

i

VC, software must check whether the VC Negotiation Pending bit is cleared in the components at both ends of the Link.

VC Arbitration Table

A port implements a VC Arbitration Table if both of the following are true:

The Port supports more than one VC.

The Port implements a WRR arbitration scheme.

The table consists of a set of read/write registers and is only used if the configuration software selects (via the VC Arbitration Select field in Table 24 - 15 on page 944) one of the implemented WRR VC arbitration schemes (see VC Arbitration Capability in Table 24 - 15 on page 944).

PCI Express System Architecture

The configuration software configures the table with the arbitration scheme that the egress port logic uses to service the VC transmit buffers associated with the port. See the description of the Load VC Arbitration Table bit in Table 24 - 16 on page 945 for a description of how the table is uploaded into the port's logic. For a detailed description of the VC Arbitration Table, refer to "Loading the Virtual Channel Arbitration Table" on page 270.

Port Arbitration Tables

A VC implements a Port Arbitration Table if both of the following are true:

The Port supports more than one VC.

The VC implements a WRR arbitration scheme.

The table consists of a set of read/write registers and is only used if the configuration software selects (via the Port Arbitration Select field in Table 24 - 19 on page 949) one of the implemented WRR Port arbitration schemes (see Port Arbitration Capability in Table 24 - 18 on page 947).

The configuration software configures the table with the arbitration scheme that defines in what order the VC accepts packets being sourced from the ingress ports that have packets to be passed to this VC buffer on the egress port. See the description of the Load VC Arbitration Table bit in Table 24 - 16 on page 945 for a description of how the table is uploaded into the port's logic.

This register array is valid for all Switch Ports and RCRBs, but not for Endpoint devices or Root Ports. For a detailed description of the Port Arbitration Tables, refer to "The Port Arbitration Mechanisms" on page 277.

Device Serial Number Capability

This optional register set can be implemented on any PCI Express device in accordance with the following rules:

It consists of the Enhanced Capability Header pictured in Figure 24-37 on page 953 and the 64-bit Serial Number register pictured in Figure 24-38 on page 953.

The device serial number is a unique, read-only 64-bit value assigned to the device when it is manufactured.

A multifunction device with this feature only implements it on function 0 and other functions within the device must return the same serial number value as that reported by function 0 .

Any component (e.g., a Switch) that contains multiple devices must return

Chapter 24: Express-Specific Configuration Registers

the same serial number for each device within the component.

Figure 24-37: Device Serial Number Enhanced Capability Header

Figure 24-38: Device Serial Number Register

The serial number is also known as the EUI-64. Refer to Figure 24-39 on page 954. A portion of the Extended Unique Identifier (EUI)-64 is assigned by a registration authority operating under the auspices of the IEEE organization. The EUI-64 consists of:

24-bit company ID value assigned by IEEE. Bit 6, the Universal/Local scope bit, is always set to one (Universal scope ID, not assigned to anything else in the universe) in the value assigned by the IEEE.

40-bit extension ID assigned by the company that "owns" the assigned company ID. The interpretation of the company-assigned extension is outside the scope of the spec. As an example, it may represent the device ID and manufacturer-assigned serial number.

PCI Express System Architecture

Figure 24-39: EUI-64 Format

Power Budgeting Capability

General

Refer to Chapter 15, entitled "Power Budgeting," on page 557 for a detailed description of the Power Budgeting capability.

This optional capability permits the platform to properly allocate power to a device that is hot-plugged into the system during runtime. Using this register set, the device reports the following to the platform:

The power it consumes on a variety of power rails.

The power it consumes in different power management states.

The power it consumes under different operating conditions.

The platform (i.e., the system and the OS) uses this information to ensure that the system can provide the proper power and cooling levels to the device.

Implementation of this capability register set (see Figure 24-40 on page 955) is optional for devices that are implemented either in a form factor which does not require hot-plug support, or that are integrated on the system board. Although the spec states that "PCI Express form factor specifications may require support for power budgeting," it does not indicate any specific cases where this is required.

Figure 24-40 on page 955 illustrates the register set and Figure 24-41 on page 955 illustrates its Enhanced Capability Header register.

Chapter 24: Express-Specific Configuration Registers

How It Works

The power budgeting data for the function consists of a table of

n

entries starting with entry 0 . Each entry is read by placing an index value in the Power Budgeting Data Select register (Figure 24-40 on page 955) and then reading the value returned in the Power Budgeting Data register (Figure 24-42 on page 956).The end of table is indicated by a return value of all 0 's in the Data register.

In the Power Budgeting Capability register (see Figure 24-43 on page 956), the System Allocated bit is automatically set to one if the device is integrated onto the system board and its power requirements are therefore already taken into account in the system's power supply budget. In that case, the device's power requirements should be ignored by software in making power budgeting decisions.

Figure 24-40: Power Budget Register Set

Figure 24-41: Power Budgeting Enhanced Capability Header

Chapter 24: Express-Specific Configuration Registers

RCRB

General

As mentioned in "Root Complex Register Blocks (RCRBs)" on page 765, a Root Port may optionally implement a Root Complex Register Block (RCRB) as a 4KB block of memory-mapped IO registers that can include one or more of the optional PCI Express extended capabilities and other implementation specific registers that apply to the Root Complex. An RCRB must not reside in the same memory-mapped IO address space as that defined for normal PCI Express functions. Multiple Root Ports or internal devices may be associated with the same RCRB (see Figure 24-44 on page 958 for an example).

Firmware Gives OS Base Address of Each RCRB

The spec requires that the platform firmware must communicate the base address of the RCRB for each Root Port or internal device in the Root Complex to the OS. How this is accomplished is outside the scope of the spec.

Misaligned or Locked Accesses To an RCRB

A Root Complex is not required to support memory access requests to an RCRB that cross dword address boundaries or that are accomplished using a locked transaction series. Software should therefore not attempt an access of either type to an RCRB unless it is has device-specific knowledge that the Root Complex supports this access type.

Extended Capabilities in an RCRB

Any Extended Capability register sets in an RCRB must always begin at offset 0h within the RCRBs 4KB memory-mapped IO address space. If the RCRB does not implement any of the optional extended capability register sets, this is indicated by an Enhanced Capability header with a Capability ID of FFFFh and a Next Capability Offset of

0 h

PCI Express System Architecture

The RCRB Missing Link

The 1.0a version of the spec does not identify any method for discovering the existence of RCRBs that may reside within a Root Complex, nor does it identify any method for associating an RCRB with one or more Root Ports. As of the time of this writing

(6 / 6 / 03)

,there is a draft ECN (Engineering Change Notice) to the 1.0a spec that has not yet been approved that addresses this issue. As soon as it is approved, MindShare will immediately include this information in classes taught by MindShare and, of course, this information will be provided in the Second Edition of this book. It is not included in this edition because draft changes have a habit of mutating before they reach their finalized, approved form.

Figure 24-44: RCRB Example

Appendix A Test, Debug and Verification of PCI Express $^{TM}$ Designs

by Gordon Getty, Agilent Technologies

Scope

The need for greater I/O bandwidth in the computer industry has caused designers to shift from using parallel buses like ISA,

{PCI}^{TM}

and PCI-

X^{TM}

to using multi-lane serial interconnects running at Gigabit speed. The industry has settled on PCI Express

^{TM}

technology as the key I/O technology of the future,as it delivers on the higher bandwidth requirements, helps to reduce cost for silicon vendors and leverages the software environment from the pervasive PCI/ PCI-X technology. While the change from parallel buses to multi-lane serial buses sounds like a small step, it presented a whole set of new debug and validation challenges to designers.

Serial technology requires a different approach to testing, starting from the physical layer and moving up through the transaction layer. In many cases, the parallel bus had several slots connected to the same physical lines, which allowed you to connect test equipment to the same bus and monitor other devices. With the point-to-point nature of serial technologies, this is no longer possible, and with the speed moving from the megahertz range to the gigahertz range, probing of the signal becomes a real challenge. PCI Express System Architecture

The second generation of PCI Express,known as PCI Express 2.0 (PCIe

^{TM}

2.0),is based on PCI Express 1.0 principles, but it supports speeds of up to 5 GT/s. Preserving backwards compatibility with PCI Express 1.0 presents its own set of challenges. Also, new and extended capabilities related to energy savings - including active state power management (ASPM) and dynamic link width negotiation - makes achieving interoperability between devices more challenging, especially if these features are implemented incorrectly. Careful design and validation processes can help you avoid costly chip re-spins to fix interoperabil-ity issues.

This chapter will guide you through overcoming the challenges faced when you debug and validate your PCI Express devices.

Appendix A: Test, Debug and Verification

Electrical Testing at the Physical Layer

PCI Express specification requires devices to have a built-in mechanism for testing the electrical characteristics of the devices, such as exists on motherboards and systems. When the transmit lanes of a device are terminated with a 50-ohm load, the transmit lanes are forced into a special mode known as compliance mode.

When a device is in compliance mode, it automatically generates a specific pattern known as the compliance pattern. Two different de-emphasis modes are introduced with the

5.0 Gb / s

transfer rate. All add-in cards should be tested at the

2.5 Gb / s

speed with

- 3.5 dB

de-emphasis (Figure A-1),and at

5.0 Gb / s

(Figure A-2) with both

- 3.5 dB

de-emphasis and

- 6 dB

de-emphasis.

Figure A-1: 2.5-GT/s PCIe Compliance Pattern

PCI Express System Architecture

Figure A-2: 5-GT/s PCIe Compliance Pattern

The equipment required to carry out electrical testing on PCIe 2.0 devices includes a high-performance oscilloscope such as the Agilent Technologies DSO81304B 13-GHz Infiniium scope and a board into which you can plug an add-in card to provide a load on its transmitters. Alternatively, you can use a load board that plugs into a system and forces its transmitters into compliance mode ensuring that the device is generating a measurable signal.

PCI Express specifications (V1.1 and later) requires you to capture and process one million unit intervals of data to be able to make a valid measurement. The Agilent 81304B scope has a "QuickMeas" (QM) function that provides user-defined macros and data capture functionality intended to meet needs that may be very specific to a given application or measurement.

The PCI-SIG® provides compliance base board and compliance load board to help accomplish these tasks. These boards provide a consistent platform to make electrical measurements. Figures A-3 and A-4 show a typical setup.

PCI Express System Architecture

With the setups shown in Figures A-3 and A-4, data is captured on the oscilloscope. Post-processing is used to measure jitter on the reference clock and to measure the random and deterministic jitter on the data lines. In electrical testing, you need to test each individual lane independently, as each lane is likely to have different electrical characteristics. The data is captured and then post-processed to form an eye diagram, such as the one shown in Figure A-5.

Figure A-5: Oscilloscope Eye Diagram

Using the eye diagram, you can measure the tolerances of voltage and jitter against the specification to determine if the device is compliant electrically. If you find the device is not compliant, you have an early indicator that interoper-ability is a potential issue.

Appendix A: Test, Debug and Verification

Link Layer Testing

Once you have determined that the electrical characteristics of your device are within the specification, your device should be able to successfully establish a link with another device. For example, you should be able to plug an add-in card into a system motherboard and have the devices negotiate and complete link training. If you have followed the specification for the link training and status state machine (LTSSM) design, the two devices will be able to negotiate the number of lanes and the speed of the link. For PCI Express 1.0, the only speed required is

2.5 Gb / s

,whereas in PCI Express 2.0,speeds may go up to

5 Gb / s

speed. Speed negotiation occurs during link training.

Figure A-6 shows a test setup screen of a Agilent Protocol Exerciser, through which one can test the LTSSM.

Figure A-6: Testing the LTSSM with the Agilent N5309A Exerciser for PCIe 2.0

PCI Express has a more robust set of error checking mechanisms than conventional PCI. It is possible to recover from certain types of errors that may be caused by conditions such as a marginal signal quality at the electrical layer. There are cases where you will want to test the device against error conditions to ensure the robustness of your design and to ensure that your device will continue to operate correctly in a real operating environment. This is especially important for data integrity reasons. In conventional PCI, a simple parity check was applied to the data by using a dedicated line on the bus. Should a parity error occur, the system would typically signal it by asserting the PERR signal and potentially the SERR signal, which would most likely cause a "blue screen" scenario where the system would halt.

The data link layer (DLL) in PCI Express is designed to ensure that data transferred over the physical layer reaches its destination intact without errors. This is done using an acknowledge (ACK) or not acknowledge (NAK) message protocol. The DLL adds an LCRC (Link Cyclic Redundancy Check) code to the packet that is checked at the receiving end for integrity, and if the calculation turns out to be different, it is considered an error at the DLL. It potentially signals that the data being transferred was somehow changed and is now not valid. In conventional PCI, such an error would result in a system halt. In PCI Express, the correct behavior would be to send a NAK back to the originator, indicating that it should re-send the same packet since there was an error in the transmission of the packet the previous time it was sent. The originator would hold the packet in its replay buffer until it has received some kind of acknowledgement from the other end of the link.

PCI Express also has a message generation mechanism defined in the specification that allows the devices to report that an error has occurred but it has been noticed and fixed. A correctable error (ERR_COR) message is sent in this particular case and the relevant bit is set in the configuration space.

When a packet is sent, a timer is also started, and if no acknowledgement (ACK or NAK) is sent by the time this timer expires, the device re-sends the packet stored in the replay buffer. If this happens four times in a row, the DLL forces the physical layer to retrain the link, since there is potentially a problem that cannot be resolved without retraining or re-initializing the link.

It is very difficult to create the scenarios described above using real devices since the condition technically is not built into the design. However, to ensure your devices behave properly if such a condition arises, you can use a protocol test tool such as the Agilent E2960B exerciser for PCI Express 2.0. This stimulus tool has the capability to generate traffic and non-standard conditions and test that the results are correct.

The protocol analyzer gives a view of the link at the data link and transaction layers while also providing logical physical layer information in the form of the

8 b / 10 b

symbols transmitted on the wire (See Figure A-7). No electrical signal information is provided. The protocol analyzer has the intelligence to identify and trigger on complex conditions on the bus consisting of combinations of symbols in the form of packets and, at a higher level, transactions. The protocol analyzer is connected between the two devices on the link and observes real traffic between two devices. You can probe the link, either using an interposer card that fits into a slot or a mid-bus probe that uses a predefined layout on the PCB and brings the signals to the surface of the board where they can be probed. Other probing options, especially for embedded PCI Express applications, include flying leads, which can be used when no slot or midbus footprint is available.

The exerciser and protocol analyzer tools are quite different from physical-layer tools, such as the oscilloscopes discussed earlier, in that they operate under the assumption that the physical layer is working properly. However, physical-layer errors may show up at the data link layer and transaction layer in the form of CRC or disparity errors. These types of errors can be emulated by the exerciser because it has a programmable PCI Express interface. The PCI Express exerciser is actually a fully functioning PCI Express device that can behave either as a root complex or an end point and has additional deterministic behavior capabilities on the link, including the injection of errors. The exerciser is designed to establish a link based on parameters such as link speed, link width, and scrambling enabled or disabled, among others. In addition, particularly for PCIe 2.0,the de-emphasis level can be set to off,

- 3.5 dB

- 6 dB

The PCIe 2.0 specification introduced additional features. Similar features have already shown interoperability problems with existing PCI Express 1.0a and 1.1 devices. During link training, a PCIe 2.0 device advertises the speed capability in the training control register of the training sequence. The bit used to indicate this was previously a reserved bit in the 1.0a and 1.1 specifications. When some devices that implement PCIe 1.0a and 1.1 are plugged into a 5-Gb/s capable system, these reserved bits are not set to zero, and you may not be able to successfully establish a link. Using the Agilent exerciser, it is possible to easily test the behavior of a PCI 1.x device when the higher speed class is advertised.

PCIe 2.0 makes extensive use of the recovery state of the LTSSM, both to allow backwards compatibility with PCIe 1.0 and to allow the negotiation of the higher speed. Testing the LTSSM again provides assurance that devices will be able to link and operate properly in a real environment.

Appendix A: Test, Debug and Verification

You would use an LTSSM test when one end of the link attempts to initiate a speed change to the higher speed and fails. For example, the LTSSM test can be used after exiting the Rec.Speed substate and one device remains at the lower speed while the other is at high speed. This scenario is easily tested using the Agilent exerciser in combination with the Agilent protocol analyzer.

A very important role of the DLL is to manage flow control. This is a credit-based mechanism that allows each end of the link to determine the size of the buffer for receiving packets and data at the other end of the link. Immediately after the link has trained, the flow control credits are initialized. Each device advertises the number of headers for posted, non-posted and completion packets in addition to the amount of data payload it can handle. When transaction layer packets are being sent back and forth, the DLL is responsible for sending periodic updates of the available credits to ensure no deadlock condition happens on the link. Flow control has a huge impact on the performance of the link. The exerciser can again be used to emulate different flow control scenarios, as it allows you to manually set flow control parameters from advertising no credits to advertising unlimited credits. The exerciser allows you to arbitrarily send out a DLLP, potentially with incorrect flow control data to test the behavior of the device.

PCI Express System Architecture

Transaction Layer Testing

The transaction layer in PCIe 2.0 is very similar to that of the 1.x specifications, which provides a great advantage in terms of software compatibility between the two specifications. The layered approach separates the physical connection from the upper-layer protocol. However, the new speed also presents different challenges related to performance at the DLL and transaction layers.

The protocol analyzer also allows you to measure characteristics such as actual throughput on the bus and transaction latencies. This information is extremely valuable for optimizing the performance of devices. The protocol analyzer is also a key instrument for finding errors and performing root cause analysis on error cases such as the one shown in Figure A-8. Unlike physical layer errors that may appear at the data link layer as disparity errors or CRC errors, errors at the transaction layer do not necessarily appear in the data link layer; they need to be dealt with by the receiver transaction layer. Figure A-9 shows an analyzer screen through which you can enable/disable error conditions to trigger on.

PCI Express System Architecture

Figure A-9: Typical PCI Express Error Conditions Triggerable on Agilent Protocol Analyzer

It is strategically important for you to know the performance of a device during the design stages. It is common, especially in the early stages of a technology, to have performance testing capabilities.

Stressing the system by generating back-to-back packets, using different packet lengths and types and injecting errors, is an ideal way to identify potential problems early that could eventually lead to costly redesigns. Stressing the system also helps you avoid competitive disadvantages.

Testing the system's data integrity is also critical for the success of a device. Running write/read/compare tests with known data and deterministic patterns over a long period of time allows you to test corner cases thoroughly in a short timeframe, giving you confidence that interoperability will not be an issue for the device.

The Agilent exerciser and analyzer are invaluable tools for evaluating and optimizing PCI Express designs. In the past, validation tools were limited. You could use off-the-shelf devices, test tools to change static link parameters and pattern generators. New test tools from Agilent integrate testing the high complexity and dynamic nature of PCI Express devices using one platform driven by a user-friendly GUI interface, as shown in Figure A-10.

Appendix A: Test, Debug and Verification

Via a user-friendly GUI, it is possible to set up PCI Express traffic and have it downloaded via the USB interface to the PCI Express exerciser. The exerciser provides templates, as shown in Figure A-12, for creating different types of individual packets or requests that may contain multiple packets. Each of the requests has an associated behavior. This allows you to create traffic patterns via the exerciser deterministically and also create error cases that would not be possible in a real situation. The errors may occur, but it may be unpredictable as to when and why they happen.

Figure A-12: Templates for Creating Traffic Using the Exerciser

PCI Express System Architecture

The exerciser allows you to recreate error scenarios and perform root cause analysis on them. As shown in Figure A-13, it is possible to edit a packet and inject error conditions. The erroneous packet is then generated by the exerciser and error analysis performed with the aid of an Agilent Analyzer.

The exerciser will behave as a requester or master, and also as a target depending on how you set it up. You can also control the behavior of completions. For example, when the exerciser receives a request in the form of a memory read, the default response would be to respond with a successful completion. However, you can also program it to respond in other ways, including unsuccessful completion or a completer abort. Figure A-14 shows a completion packet editor via which you can inject errors into the completer generated completion packet. In addition, it is possible to have the exerciser delay sending the completion by a specific amount of time, which would facilitate the testing of the completion timeout mechanism on the requester.

Figure A-13: Adding Single or Multiple Error Scenarios to each Request

$Edit$ Packetx
Field	Value			Length
Priority: Automatic Tag: TLP Digest: LCRC: Disparity: Payload Size: TLP Poisoned: TLP Nullified: Replace STP: Replace END: Offset Sequence Number: OKCANCEI	1		Dec w团	4 bits
	Automatic Tag.	1	Hex福	1 bit
	Marked Absent〈	0	Hex国	2 bits
	Incorrectv	1	Hex到	1 bit
	Correct〈	0	Hex国	1 bit
	Correct、	0	Hex到	1 bit
	Disabled〈	0	Hex国	1 bit
	Enabled〈	1	Hex间	1 bit
	Disabled〈	0	Hex国	1 bit
	Disabled〈	6	Hex到	1 bit
	Disabled〈	0	Hex跑	1 bit
	Help

Appendix A: Test, Debug and Verification

Figure A-14: Programming Completer Behaviors

Field	Value			Length
Completion Status:	Unsupported Request ( $▾$ 001	$Bin ∙$	国	3 bits
Read Completion Boundary:	1	Dec v	国	6 bits
Repeat:	0	Dec	到	1 byte
Priority:	Priority 1〈1	Dec	到	4 bits
TLP Digest:	Absent〈0	Hex	间	1 bit
LCRC:	Correct〈0	$Hex ↓$	到	1 bit
Disparity:	Correct〈0	$Hex ↓$	间	1 bit
Payload Size:	Correct、0	$Hex ↓$	丽	1 bit
TLP Poisoned:	Disabled〈0	Hex	国	1 bit
TLP Nullified:	Disabled〈0	$Hex ↓$	到	1 bit
Replace STP:	Disabled〈0	Hex	间	1 bit
Replace END:	Disabled、0	$Hex \cdot$	到	1 bit
Offset Sequence Number:	Disabled〈0	$Hex ↓$	司	1 bit
Discard Completion:	Disabled10	$Hex =$	到	1 bit
OKCancel	Help

PCI Express System Architecture

To create framing errors, you can replace the start-of-transaction-layer packet (STP) symbol and end of packet (END) symbols with other arbitrary values. Also, since the exerciser is a fully programmable device, it is also possible to change values for the replay timer.

You can create another type of transaction layer error by changing the length field within the TLP to be different from the actual length of the data in the payload. Most likely, this would be treated as a malformed packet. You can easily set up these errors using the exerciser, as shown in Figure A-15.

Figure A-15: Inserting Request and Completion Errors Using the Exerciser

Appendix A: Test, Debug and Verification

In addition to generating error conditions, you can use the exerciser as a tool to stress the link. This is done by programming the exerciser to do a block of requests, then repeat them continuously. It is possible to have multiple different requests repeated in continuous mode as shown in Figure A-16 and also, if required, add error-case scenarios.

Figure A-16: Using Continuous Mode on the Exerciser Running a Loop of Memory and I/O Transactions

You can program the configuration space of the exerciser to emulate different types of devices. PCIe devices may implement base address registers (BARS) and decoders as required, according to resources needed by that particular device. The BIOS on the system then assigns resources to these devices at system startup time. It is important for BIOS engineers to be able to verify that they can provide the correct resources for any combination of devices that may be plugged into the system. It is critical that address ranges do not overlap.

The exerciser has more than one way of carrying out this testing. Since the exerciser is a real PCIe device, you can manually configure and program the decoders prior to starting up the system. You can also map these decoders to different completion behaviors and priorities and map them to specific areas in the internal data memory of the exerciser. You have complete control of the location, size and type of each of these decoders, as shown in Figure A-17.

PCI Express System Architecture

Figure A-17: The Exerciser Has Fully Programmable Memory and I/O Decoders

	Decoder
BAR 0
	Decoder	Enabled	〈
	Location	Memory (32 Bit)	〈
	Prefetchable	No	〈	Resource	Data Memory〈
	Size	2^7 (128 Byte )	〈	Data Memory Base Address	000000Hex图
	Base Address	FB000000	Hex国	Completion Queue	Queue 0〈
BAR 1
	Decoder	Enabled	〈
	Location	Memory (32 Bit)	〈
	Prefetchable	No	〈	Resource	Data Memory〈
	Size	2^7 (128 Byte )〈		Data Memory Base Address	010000Hex國
	Base Address	FF000000	Hex图	Completion Queue	Queue 0〈
BAR 2
	Decoder	Enabled	〈
	Location	1/0	〈
	Prefetchable	No	-	Resource	Data Memory〈
	Size	2^2 （4 Byte）〈		Data Memory Base Address	020000Hex国
	Base Address	000000B0	Hex图	Completion Queue	Queue 0〈
BAR 3
	Decoder	Disabled	〈
	Location	1/0	+
	Prefetchable	No	+	Rescurce	Data Memory、
	Size	2^2 │ 4.8 Jte7		Data Memory Base Address	000000Hex -國
	Base Address	00000000	图	CompletionQueue	Queue 07
BAR 4
	Decoder	Disabled	、
	Location	1/0	-
	Prefetchable	No	〈	Resource	Data Memory+
	Size	202 （4 Byte )甲		Data Memory Base Address	boooooHex -图
	Base Address	00000000	Hex函	Completion Queue	Queue 0-

Appendix A: Test, Debug and Verification

A second method is available for testing system BIOS. This method also provides a way to use the exerciser card as an interface to the PCIe port on the system, but it provides device emulation in software via the USB port. The advantage of this approach is that you can emulate and test many different topologies, including multiple levels of switches or bridges. It can also test the BIOS for different types of devices and resource requirements, such as a bridge device that requests memory resources. It is a valid configuration, but in the past, we have seen certain BIOS systems that will not support this device. It is important that the behavior be correct if an optional feature is not implemented. This method would also be helpful if a device requires more resources than are available on a system. The BIOS should handle this properly and disable the device.

Topology testing has been used since the days of conventional PCI and PCI-X. A PCI or PCI-X exerciser was used to emulate different devices and then check that addresses assigned were correct and not overlapping. The same principle is used for PCI Express testing from the outset, using the Agilent E2969A protocol test card for PCI Express. This method of testing is also ported to the Gen 2 PCI Express exerciser card. The same set of tests used in Gen 1 PCI Express is used. The main difference is that the speed is now

5 GT / s

rather than

2.5 GT / s

. Since the BIOS is at the application layer, above the transaction layer, the principles of testing it are independent of the physical link width and speed. The same test cases used in the original PCI and PCI-X testing are still quite valid.

Testing can be extended to cover many complex topologies. Figure A-18 is a screen through which you can set up topology tests.

PCI Express System Architecture

Figure A-18: Topology Tests Using the Agilent E2969A Protocol Test Card

Agilent E2969A Protocol Test Card for PCI Express - Compliance Test Suite口口区
Eile Yiew Iests Maintenance Help
VOCDICALLO
Execute	Status	Category	Name	Description
n	n/a	LegacyTestCases	Test Case 2.1	No Request Case
n	n/a	LegacyTestCases	Test Case 2.10	64 Bit Bar Case
n	n/a	LegacyTestCases	Test Case 2.11	64 Bit Bar Prefetchable Case
n	n/a	LegacyTestCases	Test Case 2.12	4 1GB Mem32 Case
n	n/a	LegacyTestCases	Test Case 2.13	Lots Large 10 Case
门	n/a	LegacyTestCases	Test Case 2.14	Lots Large Mem32 and No Requests Case
n	n/a	LegacyTestCases	Test Case 2.15	Lots Large Mem32 and No Requests Case ets Case
n	n/a	LegacyTestCases	Test Case 2.3	Simple Requests With Gaps Case
n	n/a	LegacyTestCases	Test Case 2.4	Lots Of IO Requests Case
n	n/a	LegacyTestCases	Test Case 2.5	Lots Of 256b IO Case
n	n/a	LegacyTestCases	Test Case 2.6	Lots Of 256b Mem32 Case
n	n/a	LegacyTestCases	Test Case 2.7	Various Mem32 Case
n	n/a	LegacyTestCases	Test Case 2.8	Various Mem32 Prefetchable Case
几	n/a	LegacyTestCases	Test Case 2.9	Various IO Lower 16 Adress Case
n	n/a	FunctionTopologyCases	Test Case 3.1	9 Port Switch No Requests
n	n/a	FunctionTopologyCases	Test Case 3.10	Eight Function Type 0 and Type 1 Various Requests
n	n/a	FunctionTopologyCases	Test Case 3.11	5 Levels of 4 Port Switches No Requests
n	n/a	FunctionTopologyCases	Test Case 3.2	9 Port Switch Various Requests
Stopped

Summary

When you are designing and validating PCI Express designs, it is important to cover all aspects of testing, as problems in the lower layers often result in problems at the upper layers, which ultimately lead to interoperability issues. By designing and testing to the PCIe specification, you can be assured that your devices will work properly and will not face real-world compatibility problems.

Appendix B Markets & Applications for the PCI Express $^{T}^{\tilde{M}}$ Architecture

By Larry Chisvin, Akber Kazmi, and Danny Chi (PLX Technology, Inc.)

Introduction

Since its definition in the early 1990's, PCI has become one of the most successful interconnect technologies ever used in computers. Originally intended for personal computer systems, the PCI architecture has penetrated into virtually every computing platform category, including servers, storage, communications, and a wide range of embedded control applications. From its early incarnation as a 32-bit

33 MHz

interconnect,it has been expanded to offer higher speeds (currently in widespread use at 64-bit

133 MHz

,with faster versions on the way). Most importantly, each advancement in PCI bus speed and width provided backward software compatibility, allowing designers to leverage the broad code base.

As successful as the PCI architecture has become, there is a limit to what can be accomplished with a multi-drop, parallel shared bus interconnect technology. Issues such as clock skew, high pin count, trace routing restrictions in printed circuit boards (PCB), bandwidth and latency requirements, physical scalability, and the need to support Quality of Service (QoS) within a system for a wide variety of applications lead to the definition of the PCI Express

^{TM}

architecture.

PCI Express is the natural successor to PCI, and was developed to provide the advantages of a state-of-the-art, high-speed serial interconnect technology and packet based layered architecture, but maintain backward compatibility with the large PCI software infrastructure. The key goal was to provide an opti-

PCI Express System Architecture

mized and universal interconnect solution for a great variety of future platforms, including desktop, server, workstation, storage, communications and embedded systems.

Figure B-1: Migration from PCI to PCI Express

This chapter provides an overview of the markets and applications that PCI Express is expected to serve, with an explanation of how the technology will be integrated into each application, and some exploration of the advantages that PCI Express brings to each usage.

Let's review the key benefits of the PCI Express architecture before we discuss its application in different markets. Some of the key features of the architecture we reviewed in this book are:

Packet-based layered architecture

Serial interconnection at $2.5 GHz$ ( $5 GHz$ being considered)

Link-to-link and end-to-end error detection (CRC check)

Point-to-point data flow

Differential low voltage signals for noise immunity

Appendix B: Markets/Apps for PCI Express (by PLX)

Quality of Service (QoS)and Virtual Channels (VC)

Scalable from $1 x$ to $32 x$ lanes

Software (backward) compatibility with legacy PCI systems

Enterprise Computing Systems

PCI Express is expected to be deployed initially in desktop and server systems. These computers typically utilize a chipset solution that includes one or more microprocessors and two types of special interconnect devices, called north-bridges and southbridges. Northbridges connect the CPU with memory, graphics and I/O. Southbridges connect to standardized I/O devices such as hard disk drives, networking modules or devices, and often PCI expansion slots.

Desktop Systems

Typical use of PCI Express in a desktop application is shown in Figure B-2 on page 992. The PCI Express ports come directly out of the northbridge, and are bridged to PCI slots that are used for legacy plug-in cards. In some implementations the PCI Express interconnections will be completely hidden from the user behind PCI bridges, and in other implementations there will be PCI Express slots in a new PCI Express connector form factor.

The major benefit for using PCI Express in this application is the low pin count associated with serial interface technology, which will translate into lower cost. This low pin count provides the ability to create northbridges and I/O bridges with smaller footprints, and a significantly fewer number of board traces between the components. This provides a major reduction in the area and complexity of the signal/trace routing in PCBs.

Server Systems

Figure B-3 on page 993 shows PCI Express used in an enterprise server system. This system has similarities to the desktop system, since there is a north-bridge and southbridge providing functions that parallel their roles in the desktop system, and the form factor of the system is often similar. Servers, however, place a greater emphasis on performance than desktop systems do. PCI Express System Architecture

Figure B-2: PCI Express in a Desktop System

To achieve their performance and time to market objectives, server designers have adopted PCI-X. The primary attraction to PCI-X has been increased throughput, but with PCI code compatibility. PCI-X offers clear benefits compared to PCI, and will remain in server systems for a long while, but it suffers from the same shared bus limitations that have already been discussed. The high throughput of PCI Express serial interconnection provides a measurable benefit versus legacy interconnect technologies, especially as the speed of the

I / O

interconnect and the number of high speed

I / O

ports on each card increases.

Some systems will only provide PCI-X slots, but many newer systems will also offer several PCI Express slots. The number of PCI Express slots will grow over time compared to the PCI-X slots, and eventually will become dominant in the same way that PCI did with previous interconnect technologies. Since bandwidth is a primary motivator for a server, typical PCI Express slots will be either x4 or x8 lanes.

In most low to midrange server systems, the PCI-X bridging and PCI Express slots will be provided by using the ports right off of the northbridge. However, high-end systems will require more I/O slots of both kinds. Since PCI Express is a point-to-point technology, the only way to provide additional connection links is through a device called a fan out switch. Specifically, the purpose of a

Appendix B: Markets/Apps for PCI Express (by PLX)

fan out switch is to multiply the number of PCI Express lanes from an upstream host port to a higher number of downstream PCI Express devices. Figure 3 below, shows a PCI Express switch used in the system for this purpose.

Figure B-3: PCI Express in a Server System

Embedded Control

One of the many areas that PCI has penetrated is embedded-control systems. This describes a wide range of applications that measure, test, monitor, or display data, and includes applications such as industrial control, office automation, test equipment, and imaging.

In these applications, system designers typically utilize embedded processors. In many instances, leading-edge companies will differentiate their products by utilizing some custom logic in the form of an ASIC or FPGA. A bridge is often used to translate the simple custom interface and connect it to the bus.

It is expected that the embedded-control market will quickly migrate to PCI Express, with a typical example shown in Figure B-4 on page 994. Applications such as imaging and video streaming are always hungry for bandwidth, and the additional throughput of

\times 4

\times 8

PCI Express links will translate into

PCI Express System Architecture

higher video resolution, or the handling of more video streams by the system. Others will implement PCI Express because of the noise resistance its LVDS traces provide, or because of its efficient routing and its ability to hook together subsystems through a standard cable. Still others will choose PCI Express simply because of its ubiquity.

Figure B-4: PCI Express in Embedded-Control Applications

Storage Systems

PCI has become a common backplane technology for mainstream storage systems. Although it provides a good mix of features, low cost, and throughput, the "bus" has become a performance bottleneck. Figure B-5 on page 995 shows the use of PCI Express in a storage system. Systems similar to the one shown in Figure B-5 on page 995 can be built on a motherboard, or as part of a backplane. The discussion in this section applies to both form factors.

We have highlighted increased bandwidth as one of the advantages of moving to PCI Express, and nowhere is it more beneficial and obvious than in storage. The bandwidth demanded by I/O connections such as Ethernet, Fibre Channel, SCSI, and InfiniBand, is increasing rapidly. And the ability to move data between I/O modules and the host processor is critical to overall system performance.

Appendix B: Markets/Apps for PCI Express (by PLX)

In RAID based storage systems, for example, data to be archived is distributed across several disk drives to provide faster data retrieval and fault tolerance. As performance and complexity increase in these systems, the need for faster read and write operations from multiple I/O locations (disk drives) becomes extremely important. PCI Express, with its high performance, point-to-point architecture becomes very desirable for this application.

PCI Express provides a key reliability benefit in storage applications as well. The specification provides for two different types of error checking (CRC) schemes. There is CRC information between each link to ensure a reliable connection, and there is an optional end-to-end CRC that travels with the data from source to destination.

In High Availability (HA) applications, a separate host can reside in the system (as shown in Figure B-5 on page 995) for failover. If and when the primary host becomes unstable or non-operational, the secondary host will take over control of the system. This is an important feature for system level reliability when the designer is attempting to eliminate as many single points of failure as possible. This secondary host will be integrated into the system using non-transparent bridging (a detailed discussion of non-transparent bridging is provided in Appendix C).

Figure B-5: PCI Express in a Storage System

Communications Systems

The last application that we will explore is the use of PCI Express in communications systems. As with previous usage models, PCI technology has in the past made significant inroads into communication systems, but over time it has become less desirable due to the inherent limitations of a shared bus. In general, serial interconnects such as PCI Express have become attractive to backplane system designers by providing switch-based topologies that enable higher reliability, scalability, and robustness.

High end communications systems are based on one or more racks, with a mid-plane or backplane chassis used to interconnect each subsystem. Many systems use the CompactPCI architecture for their backplane implementations, and in some cases proprietary bus solutions are used to interconnect line cards, the switch fabric and the control modules. Some vendors are moving toward the AdvancedTCA

^{TM}

(ATCA) architecture that supports a variety of different fabrics based upon a standard chassis for communication applications. ATCA has a range of benefits, but the PCI Express version of ATCA provides a smooth migration to higher speeds and a set of features that fit well with the communications paradigm.

One feature common to many communications systems is the ability to assign priorities to different data streams based on Quality of Service (QoS). PCI Express offers Traffic Classes (TC) that can be used to differentiate types of data. These TCs are then mapped onto Virtual Channels (VC) within the hardware. Each VC has its own set of queues in the subsystem, providing a separate path through the switch or bridge. This mechanism can be used to provide separate channels for different types of traffic (I/O, data, special messages).

Figure B-6 on page 997 shows a typical communication switch or router. Only 6 slots are shown for illustration purposes, but actual systems typically have 10+ slots. As shown in the figure, PCI Express technology can be used to support redundant switch fabrics and control modules, allowing communications equipment vendors to build high availability systems with a faster time to market.

Appendix C

Implementing Intelligent Adapters and Multi-Host Systems With PCI Express

^{TM}

Technology

By Jack Regula, Danny Chi and Tim Canepa (PLX Technology, Inc. )

Introduction

Intelligent adapters, host failover mechanisms and multiprocessor systems are three usage models that are common today, and expected to become more prevalent as market requirements for next generation systems. Despite the fact that each of these was developed in response to completely different market demands, all share the common requirement that systems that utilize them require multiple processors to co-exist within the system. This appendix outlines how PCI Express can address these needs through non-transparent bridging.

Because of the widespread popularity of systems using intelligent adapters, host failover and multihost technologies, PCI Express silicon vendors must provide a means to support them. This is actually a relatively low risk endeavor; given that PCI Express is software compatible with PCI, and PCI systems have long implemented distributed processing. The most obvious approach, and the one that PLX espouses, is to emulate the most popular implementation used in the PCI space for PCI Express. This strategy allows system designers to use not only a familiar implementation but one that is a proven methodology, and one that can provide significant software reuse as they migrate from PCI to PCI Express.

This paper outlines how multiprocessor PCI Express systems will be implemented using industry standard practices established in the PCI paradigm. We first, however, will define the different usage models, and review the successful efforts in the PCI community to develop mechanisms to accommodate these requirements. Finally, we will cover how PCI Express systems will utilize nontransparent bridging to provide the functionality needed for these types of systems.

Usage Models

Intelligent Adapters

Intelligent adapters are typically peripheral devices that use a local processor to offload tasks from the host. Examples of intelligent adapters include RAID controllers, modem cards, and content processing blades that perform tasks such as security and flow processing. Generally, these tasks are either computationally onerous or require significant I/O bandwidth if performed by the host. By adding a local processor to the endpoint, system designers can enjoy significant incremental performance. In the RAID market, a significant number of products utilize local intelligence for their I/O processing.

Another example of intelligent adapters is an ecommerce blade. Because general purpose host processors are not optimized for the exponential mathematics necessary for SSL, utilizing a host processor to perform an SSL handshake typically reduces system performance by over

90 %

. Furthermore,one of the requirements for the SSL handshake operation is a true random number generator. Many general purpose processors do not have this feature, so it is actually difficult to perform SSL handshakes without dedicated hardware. Similar examples abound throughout the intelligent adapter marketplace; in fact, this usage model is so prevalent that for many applications it has become the de facto standard implementation.

Host Failover

Host failover capabilities are designed into systems that require high availability. High availability has become an increasingly important requirement, especially in storage and communication platforms. The only practical way to ensure that the overall system remains operational is to provide redundancy for all components. Host failover systems typically include a host based system attached to several endpoints. In addition, a backup host is attached to the system and is configured to monitor the system status. When the primary host fails, the backup host processor must not only recognize the failure, but then take steps to assume primary control, remove the failed host to prevent additional disruptions, reconstitute the system state, and continue the operation of the system without losing any data.

Appendix D: Intelligent Adapters & Multi-Host Systems

Multiprocessor Systems

Multiprocessor systems provide greater processing bandwidth by allowing multiple computational engines to simultaneously work on sections of a complex problem. Unlike systems utilizing host failover, where the backup processor is essentially idle, multiprocessor systems utilize all the engines to boost computational throughput. This enables a system to reach performance levels not possible by using only a single host processor. Multiprocessor systems typically consist of two or more complete sub-systems that can pass data between themselves via a special interconnect. A good example of a multihost system is a blade server chassis. Each blade is a complete subsystem, often replete with its own CPU, Direct Attached Storage, and I/O.

The History Multi-Processor Implementations Using PCI

To better understand the implementation proposed for PCI Express, one needs to first understand the PCI implementation.

PCI was originally defined in 1992 for personal computers. Because of the nature of PCs at that time, the protocol architects did not anticipate the need for multiprocessors. Therefore, they designed the system assuming that the host processor would enumerate the entire memory space. Obviously, if another processor is added, the system operation would fail as both processors would attempt to service the system requests.

1Several methodologies were subsequently invented to accommodate the requirement for multiprocessor capabilities using PCI. The most popular implementation, and the one discussed in this paper for PCI Express, is the use of non-transparent bridging between the processing subsystems to isolate their memory spaces.

^{1}

Because the host does not know the system topology when it is first powered up or reset, it must perform discovery to learn what devices are present and then map them into the memory space. To support standard discovery and configuration software, the PCI specification defines a standard format for Control and Status Registers (CSRs) of compliant devices. The standard PCI-to-PCI bridge CSR header, called a Type 1 header, includes primary, secondary and subordi-

Unless explicitly noted, the architecture for multiprocessor systems using PCI and PCI Express are similar and may be used interchangeably.

PCI Express System Architecture

nate bus number registers that, when written by the host, define the CSR addresses of devices on the other side of the bridge. Bridges that employ a Type 1 CSR header are called transparent bridges.

A Type 0 header is used for endpoints. A Type 0 CSR header includes base address registers (BARs) used to request memory or I/O apertures from the host. Both Type 1 and Type 0 headers include a class code register that indicates what kind of bridge or endpoint is represented, with further information available in a subclass field and in device ID and vendor ID registers. The CSR header format and addressing rules allow the processor to search all the branches of a PCI hierarchy, from the host bridge down to each of its leaves, reading the class code registers of each device it finds as it proceeds, and assigning bus numbers as appropriate as it discovers PCI-to-PCI bridges along the way. At the completion of discovery, the host knows which devices are present and the memory and I/O space each device requires to function. These concepts are illustrated in Figure C - 1.

Figure C-1: Enumeration Using Transparent Bridges

Implementing Multi-host/Intelligent Adapters in PCI Express Base Systems

Up to this point, our discussions have been limited to one processor with one memory space. As technology progressed, system designers began developing end points with their own native processors built in. The problem that this caused was that both the host processor and the intelligent adapter would, upon power up or reset, attempt to enumerate the entire system, causing system conflict and ultimately a non-functional system.

^{2}

To get around this, architects designed non-transparent bridges. A non-transparent PCI-to-PCI Bridge, or PCI Express-to-PCI Express Bridge, is a bridge that exposes a Type 0 CSR header on both sides and forwards transactions from one side to the other with address translation, through apertures created by the BARs of those CSR headers. Because it exposes a Type 0 CSR header, the bridge appears to be an endpoint to discovery and configuration software, eliminating potential discovery software conflicts. Each BAR on each side of the bridge creates a tunnel or window into the memory space on the other side of the bridge. To facilitate communication between the processing domains on each side, the non-transparent bridge also typically includes doorbell registers to send interrupts from each side of the bridge to the other, and scratchpad registers accessible from both sides.

A non-transparent bridge is functionally similar to a transparent bridge in that both provide a path between two independent PCI buses (or PCI Express links). The key difference is that when a non-transparent bridge is used, devices on the downstream side of the bridge (relative to the system host) are not visible from the upstream side. This allows an intelligent controller on the downstream side to manage the devices in its local domain, while at the same time making them appear as a single device to the upstream controller. The path between the two buses allows the devices on the downstream side to transfer data directly to the upstream side of the bus without directly involving the intelligent controller in the data movement. Thus transactions are forwarded across the bus unfettered just as in a PCI-to-PCI Bridge, but the resources responsible are hidden from the host, which sees a single device.

While we are using an intelligent endpoint as the examples, we should note that a similar problem exists for multi-host systems.

PCI Express System Architecture

Because we now have two memory spaces, the PCI Express system needs to translate addresses of transactions that cross from one memory space to the other. This is accomplished via Translation and Limit Registers associated with the BAR. See "Address Translation" on page 1013 for a detailed description; Figure C-2 on page 1004 provides a conceptual rendering of Direct Address Translation. Address translation can be done by Direct Address Translation (essentially replacement of the data under a mask), table lookup, or by adding an offset to an address. Figure C-3 on page 1005 shows Table Lookup Translation used to create multiple windows spread across system memory space for packet originated in a local I/O processor's domain, as well as Direct Address Translation used to create a single window in the opposite direction.

Figure C-2: Direct Address Translation

Appendix D: Intelligent Adapters & Multi-Host Systems

Figure C-3: Look Up Table Translation Creates Multiple Windows

Example: Implementing Intelligent Adapters in a PCI Express Base System

Intelligent adapters will be pervasive in PCI Express systems, and will likely be the most widely used example of systems with "multiple processors".

Figure C-4 on page 1006 illustrates how PCI Express systems will implement intelligent adapters. The system diagram consists of a system host, a root complex (the PCI Express version of a Northbridge), a three port switch, an example endpoint, and an intelligent add-in card. Similar to the system architecture, the add-in card contains a local host, a root complex, a three port switch, and an example endpoint. However we should note two significant differences: the

intelligent add-in card contains an EEPROM, and one port of the switch contains a back to back non-transparent bridge.

Figure C-4: Intelligent Adapters in PCI and PCI Express Systems

Upon power up, the system host will begin enumerating to determine the topology. It will pass through the Root Complex and enter the first switch (Switch A). Upon entering the topmost port, it will see a transparent bridge, so it will know to continue to enumerate. The host will then poll the leftmost port and, upon finding a Type 0 CSR header, will consider it an endpoint and explore no deeper along that branch of the PCI hierarchy. The host will then use the information in the endpoint's CSR header to configure base and limit registers in bridges and BARs in endpoints to complete the memory map for this branch of the system.

Appendix D: Intelligent Adapters & Multi-Host Systems

The host will then explore the rightmost port of Switch A and read the CSR header registers associated with the top port of Switch B. Because this port is a non-transparent bridge, the host finds a Type 0 CSR header. The host processor therefore believes that this is an endpoint and explores no deeper along that branch of the PCI hierarchy. The host reads the BARs of the top port of Switch B to determine the memory requirements for windows into the memory space on the other side of the bridge. The memory space requirements can be preloaded from an EEPROM into the BAR Setup Registers of Switch B's non-transparent port or can be configured by the processor that is local to Switch B prior to allowing the system host to complete discovery.

Similar to the host processor power up sequence, the local host will also begin enumerating its own system. Like the system host processor, it will allocate memory for end points and continue to enumerate when it encounters a transparent bridge. When the host reaches the topmost port of Switch B, it sees a non-transparent bridge with a Type 0 CSR header. Accordingly, it reads the BARs of the CSR header to determine the memory aperture requirements, then terminates discovery along this branch of its PCI tree. Again, the memory aperture information can be supplied by an EEPROM, or by the system host.

Communication between the two processor domains is achieved via a mailbox system and doorbell interrupts. The doorbell facility allows each processor to send interrupts to the other. The mailbox facility is a set of dual ported registers that are both readable and writable by both processors. Shared memory mapped mechanisms via the BARs may also be used for inter-processor communication.

Example: Implementing Host Failover in a PCI Express System

Figure C-5 on page 1008 illustrates how most PCI Express systems will implement host failover. The primary host processor in this illustration is on the left side of the diagram, with the backup host on the right side of the diagram. Like most systems with which we are familiar, the host processor connects to a root complex. In turn, the root complex routes its traffic to the switch. In this example, the switch has two ports to end points in addition to the upstream port for the primary host we have just described. Furthermore, this system also has another processor, which is connected to the switch via another root complex. a transparent bridge or a non-transparent bridge. An EEPROM or strap pins on the switch can be used to initially bootstrap this configuration.

Figure C-5: Host Failover in PCI and PCI Express Systems

The switch ports to both processors need to be configurable to behave either as

Under normal operation, upon power up, the primary host begins to enumerate the system. In our example, as the primary host processor begins its discovery protocol through the fabric, it discovers the two end points, and their memory requirements, by sizing their BARs. When it gets to the upper right port, it finds a Type 0 CSR header. This signifies to the primary host processor that it should not attempt discovery on the far side of the associated switch port. As in the previous example, the BARs associated with the non-transparent switch port may have been configured by EEPROM load prior to discovery or might be configured by software running on the local processor.

Appendix D: Intelligent Adapters & Multi-Host Systems

Again, similar to the previous example, the backup processor powers up and begins to enumerate. In this example, the backup processor chipset consists of the root complex and the backup processor only. It discovers the non-transparent switch port and terminates its discovery there. It is keyed by EEPROM loaded Device ID and Vendor ID registers to load an appropriate driver.

During the course of normal operation, the host processor performs all of its normal duties as it actively manages the system. In addition, it will send messages to the backup processor called heartbeat messages. Heartbeat messages are indications of the continued good health of the originating processor. A heartbeat message might be as simple as a doorbell interrupt assertion, but typically would include some data to reduce the possibility of a false positive. Checkpoint and journal messages are alternative approaches to providing the backup processor with a starting point, should it need to take over. In the journal methodology, the backup is provided with a list or journal of completed transactions (in the application specific sense, not in the sense of bus transactions). In the checkpoint methodology, the backup is periodically provided with a complete system state from which it can restart if necessary. The heartbeat's job is to provide the means by which the backup processor verifies that the host processor is still operational. Typically this data provides the latest activities and the state of all the peripherals.

If the backup processor fails to receive timely heartbeat messages, it will begin assuming control. One of its first tasks is to demote the primary port to prevent the failed processor from interacting with the rest of the system. This is accomplished by reprogramming the CSRs of the switch using a memory mapped view of the switch's CSRs provided via a BAR in the non-transparent port. To take over, the backup processor reverses the transparent/non-transparent modes at both its port and the primary processor's port and takes down the link to the primary processor. After cleaning up any transactions left in the queues or left in an incomplete state as a result of the host failure, the backup processor reconfigures the system so that it can serve as the host. Finally, it uses the data in the checkpoint or journal messages to restart the system.

Example: Implementing Dual Host in a PCI Express Base System

Figure C-6 on page 1010 illustrates how PCI Express systems might implement a dual host system

^{3}

. In this example,the leftmost blocks are a typically complete system, with the rightmost blocks being a separate subsystem. As previously discussed, connecting the leftmost and rightmost diagram is a set of nontransparent bridges.

Figure C-6: Dual Host in a PCI and PCI Express System

Back to back non-transparent (NT) ports are unnecessary but occur as a result of the use of identical single board computers for both hosts. A transparent backplane fabric would typically be interposed between the two NT ports.

Appendix D: Intelligent Adapters & Multi-Host Systems

Upon power up, both processors will begin enumerating. As before, the hosts will search out the endpoints by reading the CSR and then allocate memory appropriately. When the hosts encounter the non-transparent bridge port in each of their private switches, they will assume it is an endpoint and, using the data in the EEPROM, allocate resources. Both systems will use the doorbell and mailbox registers described above to communicate with each other.

^{2}

The dual-host system model may be extended to a fully redundant dual star system by using additional switches to dual-port the hosts and line cards into a redundant fabric as shown in Figure C-7 on page 1012. This is particularly attractive to vendors who employ chassis based systems for their flexibility, scalability and reliability.

Two host cards are shown. Host A is the primary host of Fabric A and the secondary host of Fabric B. Similarly, Host B is the primary host of Fabric B and the secondary host of Fabric A.

Each host is connected to the fabric it serves via a transparent bridge/switch port and to the fabric for which it provides only backup via a non-transparent bridge/switch port. These non-transparent ports are used for host-to-host communications and also support cross-domain peer-to-peer transfers where address maps do not allow a more direct connection.

Appendix D: Intelligent Adapters & Multi-Host Systems

Address Translation

This section provides an in-depth description of how systems that use nontransparent bridges communicate using address translation. We provide details about the mechanism by which systems determine not only the size of the memory allocated, but also about how memory pointers are employed. Implementations using both Direct Address Translation as well as Lookup Table Based Address Translation are discussed. By using the same standardized architectural implementation of non transparent bridging popularized in the PCI paradigm into the PCI Express environment, interconnect vendors can speed market adoption of PCI Express into markets requiring intelligent adapters, host failover and multihost capabilities.

The transparent bridge uses base and limit registers in I/O space, non-prefetch-able memory space, and prefetchable memory space to map transactions in the downstream direction across the bridge. All downstream devices are required to be mapped in contiguous address regions such that a single aperture in each space is sufficient. Upstream mapping is done via inverse decoding relative to the same registers. A transparent bridge does not translate the addresses of forwarded transactions/packets.

The non-transparent bridges use the standard set of BARs in their Type 0 CSR header to define apertures into the memory space on the other side of the bridge. There are two sets of BARs: one on the Primary side and one on the Secondary. BARs define resource apertures that allow the forwarding of transactions to the opposite (other side) interface.

For each BAR bridge there exists a set of associated control and setup registers usually writable from the other side of the bridge. Each BAR has a "setup" register, which defines the size and type of its aperture, and an address translation register. Some bars also have a limit register that can be used to restrict its aperture's size. These registers need to be programmed prior to allowing access from outside the local subsystem. This is typically done by software running on a local processor or by loading the registers from EEPROM.

In PCI Express, the Transaction ID fields of packets passing through these apertures are also translated to support Device ID routing. These Device IDs are used to route completions to non-posted requests and ID routed messages.

The transparent bridge forwards CSR transactions in the downstream direction according to the secondary and subordinate bus number registers, converting Type 1 CSRs to Type 0 CSRs as required. The non-transparent bridge accepts only those CSR transactions addressed to it and returns an unsupported request response to all others.

Direct Address Translation

The addresses of all upstream and downstream transactions are translated (except BARs accessing CSRs). With the exception of the cases in the following two sections, addresses that are forwarded from one interface to the other are translated by adding a Base Address to their offset within the BAR that they landed in as seen in Figure C-8 on page 1014. The BAR Base Translation Registers are used to set up these base translations for the individual BARs.

Figure C-8: Direct Address Translation

Lookup Table Based Address Translation

Following the de facto standard adopted by the PCI community, PCI Express should provide several BARs for the purposes of allocating resources. All BARs contain the memory allocation; however, in accordance with PCI industry conventions, BAR 0 contains the CSR information whereas BAR1 contains I/O information, BAR 2 and BAR 3 are utilized for Lookup Table Based Translation. BAR 4 and BAR 5 are utilized for Direct Address Translations.

On the secondary side, BAR3 uses a special lookup table based address translation for transactions that fall inside its window as seen in Figure C-9 on page 1015. The lookup table provides more flexibility in secondary bus local

Appendix D: Intelligent Adapters & Multi-Host Systems

addresses to primary bus addresses. The location of the index field with the address bus is programmable to adjust aperture size.

Figure C-9: Lookup Table Based Translation

Translated Base Lookup Table

Downstream BAR Limit Registers

The two downstream BARs on the primary side (BAR2/3 and BAR4/5) also have Limit registers, programmable from the local side, to further restrict the size of the window they expose, as seen in Figure C-10 on page 1016. BARs can only be assigned memory resources in "power of two" granularity. The limit registers provide a means to obtain better granularity by "capping" the size of the BAR within the "power of two" granularity. Only transactions below the Limit registers are forwarded to the secondary bus. Transactions above the limit are discarded or return 0xFFFFFFFF, or a master abort equivalent packet, on reads.

Class	Description
00h	Function built before class codes were defined (in other words: before rev 2.0 of the PCI spec).
01h	Mass storage controller.
02h	Network controller.
03h	Display controller.
04h	Multimedia device.
05h	Memory controller.
06h	Bridge device.

Table D-1: Defined Class Codes (Continued)

Class	Description
$07 h$	Simple communications controllers.
08h	Base system peripherals.
$09 h$	Input devices.
0Ah	Docking stations.
0Bh	Processors.
$0 Ch$	Serial bus controllers.
0Dh	Wireless controllers.
0Eh	Intelligent IO controllers.
0Fh	Satellite communications controllers
10h	Encryption/Decryption controllers.
11h	Data acquisition and signal processing controllers.
12h-FEh	Reserved.
FFh	Device does not fit any of the defined class codes.

Table D-2: Class Code 0 (PCI rev 1.0)

Sub-Class	Prog. I/F	Description
00h	00h	All devices other than VGA.
01h	01h	VGA-compatible device.

Table D-3: Class Code 1: Mass Storage Controllers

Sub-Class	Prog. I/F	Description
00h	00h	SCSI controller.
01h	xxh	IDE controller. See Table D-20 on page 1031 for definition of Programming Interface byte.

Table D-3: Class Code 1: Mass Storage Controllers (Continued)

Sub-Class	Prog. I/F	Description
02h	00h	Floppy disk controller.
03h	00h	IPI controller.
$04 h$	00h	RAID controller.
05h	20h	ATA controller with single DMA .
05h	30h	ATA controller with chained DMA
80h	00h	Other mass storage controller.

Table D-4: Class Code 2: Network Controllers

Sub-Class	Prog. I/F	Description
00h	00h	Ethernet controller.
01h	00h	Token ring controller.
02h	00h	FDDI controller.
03h	00h	ATM controller.
$04 h$	00h	ISDN Controller.
05h	00h	WorldFip controller.
06h		PICMG 2.14 Multi Computing. For information on the use of the Programming Interface Byte, see the PICMG 2.14 Multi Computing Specification (http://www.picmg.com).
80h	00h	Other network controller.

Table D-5: Class Code 3: Display Controllers

Sub-Class	Prog. I/F	Description
00h	00h	VGA-compatible controller, responding to memory addresses 000A0000h through 000BFFFFh (Video Frame Buffer), and IO addresses 03B0h through 3BBh, and 03C0h through-03DFh and all aliases of these addresses.
00h	01h	8514-compatible controller, responding to IO address 02E8h and its aliases, 02EAh and 02EFh.
01h	00h	XGA controller.
02h	00h	3D Controller.
80h	00h	Other display controller.

Table D-6: Class Code 4: Multimedia Devices

Sub-Class	Prog. I/F	Description
00h	00h	Video device.
01h	00h	Audio device.
02h	00h	Computer Telephony device.
80h	00h	Other multimedia device.

Table D-7: Class Code 5: Memory Controllers

Sub-Class	Prog. I/F	Description
00h	00h	RAM memory controller.
01h	00h	Flash memory controller.
80h	00h	Other memory controller.

Table D-8: Class Code 6: Bridge Devices

1
Sub-Class	Prog. I/F	Description
00h	00h	Host/PCI bridge.
01h	00h	PCI/ISA bridge.
02h	00h	PCI/EISA bridge.
03h	00h	PCI/Micro Channel bridge.
04h	00h	PCI/PCI bridge.
04h	01h	Subtractive decode PCI-to-PCI bridge. Sup- ports subtractive decode in addition to normal PCI-to-PCI functions. For a detailed discussion of this bridge type, refer to the MindShare PC1 System Architecture book, Fourth Edition (pub- lished by Addison-Wesley).
05h	00h	PCI/PCMCIA bridge
06h	00h	PCI/NuBus bridge.
07h	00h	PCI/CardBus bridge.
08h	xxh	RACEway bridge. RACEway is an ANSI stan- dard (ANSI/VITA 5-1994) switching fabric. Bits 7:1 of the Interface bits are reserved, read-only and return zeros. Bit 0 is read-only and, if 0, indicates that the bridge is in Transparent mode, while 1 indicates that it's in End-Point mode.
09h	40h	Semi-transparent PCI-to-PCI bridge with the primary PCI bus side facing the system host processor.
09h	80h	Semi-transparent PCI-to-PCI bridge with the secondary PCI bus side facing the system host processor
0Ah	00h	InfiniBand-to-PCI host bridge.
80h	00h	Other bridge type.

Sub-Class	Prog. I/F	Description
00h	00h	Generic XT-compatible serial controller.
01h	16450-compatible serial controller.
02h	16550-compatible serial controller.
03h	16650-compatible serial controller.
$04 h$	16750-compatible serial controller.
05h	16850-compatible serial controller.
06h	16950-compatible serial controller.
01h	00h	Parallel port.
01h	Bi-directional parallel port.
02h	ECP 1.X-compliant parallel port.
03h	IEEE 1284 controller.
FEh	IEEE 1284 target device (not a controller).
02h	00h	Multiport serial controller.

Table D-9: Class Code 7: Simple Communications Controllers (Continued)

Sub-Class	Prog. I/F	Description
03h	00h	Generic modem.
	01h	Hayes-compatible modem, 16450-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
	02h	Hayes-compatible modem, 16550-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
	03h	Hayes-compatible modem, 16650-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
	04h	Hayes-compatible modem, 16750-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
$04 h$	00h	GPIB (IEEE 488.1/2) controller.
05h	00h	Smart Card.
80h	00h	Other communications device.

Table D-10: Class Code 8: Base System Peripherals

Sub-Class	Prog. I/F	Description
00h	00h	Generic 8259 programmable interrupt control- ler (PIC).
	01h	ISA PIC.
	02h	EISA PIC.
	10h	IO APIC. Base Address Register 0 is used to request a minimum of 32 bytes of non-Prefetch- able memory. Two registers within that space are located at Base + 00h (IO Select Register) and Base + 10h (IO Window Register). For a full description of the use of these registers, refer to the data sheet for the Intel 8237EB in the 82420/ 82430 PCIset EISA Bridge Databook #290483- 003.
	20h	IO(x) APIC interrupt controller.
01h	00h	Generic 8237 DMA controller.
	01h	ISA DMA controller.
	02h	EISA DMA controller.
02h	00h	Generic 8254 timer.
	01h	ISA system timers.
	02h	EISA system timers.
03h	00h	Generic RTC controller.
03h	01h	ISA RTC controller.
$04 h$	00h	Generic PCI Hot-Plug controller.
80h	00h	Other system peripheral.

Table D-11: Class Code 9: Input Devices

Sub-Class	Prog. I/F	Description
00h	00h	Keyboard controller.
01h	00h	Digitizer (pen).
02h	00h	Mouse controller.
03h	00h	Scanner controller.
$04 h$	00h	Generic gameport controller.
$04 h$	10h	Gameport controller. A gameport controller with a Programming Interface $= 10 h$ indicates that any Base Address registers in this function that request/assign IO address space, the regis ters in that IO space conform to the standard “legacy” game ports. The byte at offset 00h in an IO region behaves as a legacy gameport interface where reads to the byte return joy- stick/gamepad information and writes to the byte start the RC timer. The byte at offset 01h is an alias of the byte at offset 00h. All other bytes in an IO region are unspecified and can be used in vendor unique ways.
80h	00h	Other input controller.

Table D-12: Class Code A: Docking Stations

Sub-Class	Prog. I/F	Description
00h	00h	Generic docking station.
80h	00h	Other type of docking station.

Table D-13: Class Code B: Processors

Sub-Class	Prog. I/F	Description
00h	00h	386.
01h	00h	486.
02h	00h	Pentium.
10h	00h	Alpha.
20h	00h	PowerPC.
30h	00h	MIPS
40h	00h	Co-processor.

Table D-14: Class Code C: Serial Bus Controllers

Sub-Class	Prog. I/F	Description
00h	00h	Firewire (IEEE 1394).
00h	10h	IEEE 1394 using 1394 OpenHCI spec.
01h	00h	ACCESS.bus.
02h	00h	SSA (Serial Storage Architecture).
03h	00h	USB (Universal Serial Bus) controller using Universal Host Controller spec.
	10h	USB (Universal Serial Bus) controller using Open Host Controller spec.
	80h	USB (Universal Serial Bus) controller with no specific programming interface.
	FEh	USB device (not Host Controller).
$04 h$	00h	Fibre Channel.
05h	00h	SMBus (System Management Bus).
06h	00h	InfiniBand.

Table D-14: Class Code C: Serial Bus Controllers (Continued)

Sub-Class	Prog. I/F	Description
$07 h$	00h	IPMI SMIC Interface. The register interface def- initions for the Intelligent Platform Manage- ment Interface Sub-Class 07h) are in the IPMI specification.
	01h	IPMI Kybd Controller Style Interface
	02h	IPMI Block Transfer Interface
08h	00h	SERCOS Interface Standard (IEC 61491). There is no register level definition for the SERCOS Interface standard. For more information see IEC 61491.
09h	00h	CANbus.
80h	00h	Other type of Serial Bus Controller.

Table D-15: Class Code D: Wireless Controllers

Sub-Class	Interface	Meaning
00	00h	iRDA compatible controller
01h	00h	Consumer IR controller
10h	00h	RF controller
11h	00h	Bluetooth.
12h	00h	Broadband.
80h	00h	Other type of wireless controller

Table D-16: Class Code E: Intelligent IO Controllers

Sub-Class	Interface	Meaning
00h	xxh	Intelligent IO controller adhering to the I2O Architecture spec. The spec can be down- loaded from ftp.intel.com/pub/IAL/i2o/.
00h	00h	Message FIFO at offset 40h.
80h	00h	Other type of Intelligent IO Controller.

Table D-17: Class Code F: Satellite Communications Controllers

Sub-Class	Interface	Meaning
01h	00h	TV
02h	00h	Audio
03h	00h	Voice
$04 h$	00h	Data
80h	00h	Other type of Satellite Communications Controller.

Table D-18: Class Code 10h: Encryption/Decryption Controllers

Sub-Class	Interface	Meaning
00h	00h	Network and computing Encrypt/Decrypt.
10h	00h	Entertainment Encrypt/Decrypt.
80h	00h	Other Encrypt/Decrypt.

Table D-19: Class Code 11h: Data Acquisition and Signal Processing Controllers

Sub-Class	Interface	Meaning
00h	00h	DPIO modules.
01h	00h	Performance counters.
10h	00h	Communications synchronization plus time and frequency test/measurement.
20h	00h	Management card.
80h	00h	Other Data Acquisition and Signal Process- ing Controllers.

Table D-20: Definition of IDE Programmer's Interface Byte Encoding

Bit(s)	Description
0	Operating mode (primary).
1	Programmable indicator (primary).
2	Operating mode (secondary).
3	Programmable indicator (secondary).
6:4	Reserved. Hardwired to zero.
7	Master IDE device.

Note: The SIG document PCI IDE Controller Specification completely describes the layout and meaning of bits 0 through 3 in the Programming Interface byte. The document Bus Master Programming Interface for IDE ATA Controllers describes the meaning of bit 7 in the Programming Interface byte. While the PCI 2.1 spec stated that this document could be obtained via FAX by calling (408)741-1600 and requesting document 8038, that reference was removed from the 2.3 spec.

Appendix E

Locked Transactions Series

Introduction

Native PCI Express implementations do not support lock. Support for Locked transaction sequences exist solely for supporting legacy device software executing on the host processor that performs a locked RMW (read-modify-write) operation on a memory semaphore that may reside within the memory of a legacy PCI device. This chapter defines the protocol defined by PCI Express for supporting locked access sequences that target legacy devices. Failure to support lock may result in deadlocks.

Background

PCI Express continues the PCI 2.3 tradition of supporting locked transaction sequences (RMW-ready-modify-write) to support legacy device software. PCI Express devices and their software drivers are never allowed to use instructions that cause the CPU to generate locked operations that target memory that resides beneath the Roor Complex level.

Locked operations consist of the basic RMW sequence, that is:

One or more memory reads from the target location to obtain the semaphore value.

The modification of the data within a processor register.

One or more writes to write the modified semaphore value back to the target memory location.

PCI Express System Architecture

This transaction sequence must be performed such that no other accesses are permitted to the target locations (or device) during the locked sequence. This requires blocking other transactions during the operation. The result potentially can result in deadlocks and poor performance.

The devices required to support locked sequences are:

The Root Complex.

Any Switches in the path leading to a legacy devices that may be the target of a locked transaction series.

A PCI Express - to - PCI Bridge.

A PCI Express-to-PCI-X Bridge.

Any legacy devices whose device drivers issue locked transactions to memory residing within the legacy device.

No other devices must support locked transactions and must ignore any locked transactions that they receive.

Lock in the PCI environment is achieved, in part, via the use of the PCI LOCK# signal. The equivalent functionality in PCI Express is accomplished via a transaction that emulates the LOCK signal functionality.

The PCI Express Lock Protocol

The only source of lock supported by PCI Express is the system processor, and, as a consequence, the source of all locked operations in PCI Express is the Root Complex (acting as the processor's surrogate). A locked operation is performed between a Root Complex downstream port and the PCI Express downstream port to which the targeted legacy device is attached. In most systems, the legacy device is typically a PCI Express-to-PCI or PCI Express-to-PCI-X bridge. Only one locked sequence at a time is supported for a given hierarchical path.

PCI Express limits locked transactions to Traffic Class 0 and Virtual Channel 0. All transactions with TC values other than zero that are mapped to a VC other than zero are permitted to traverse the fabric without regard to the locked operation. All transactions that are mapped to VC0 are subject to the lock rules described in this appendix. The discussion of the locked protocol in this appendix presumes that all transactions have been assigned to TC0 (unless otherwise indicated).

Lock Messages — The Virtual Lock Signal

PCI Express defines the following transactions that, together, act as a virtual wire that replaces the PCI LOCK# signal.

Memory Read Lock Request (MRdLk) - Originates a locked sequence. The first MRdLk transaction blocks other requests from reaching the target device. One or more of these locked read requests may be issued during the sequence.

Memory Read Lock Completion with Data (CplDLk) - Returns data and confirms that the path to the target is locked. A successful read Completion that returns data for the first Memory Read Lock request results in the path between the Root Complex and the target device being locked. That is, transactions traversing the same path from other ports are blocked from reaching either the root port or the target port. Transactions being routed in buffers for VC1-VC7 are unaffected by the lock.

Memory Read Lock Completion without Data (CplLK) - A Completion without a data payload indicates that the lock sequence cannot complete currently and the path remains unlocked.

Unlock Message - An unlock message is issued by the Root Complex from the locked root port. This message unlocks the path between the root port and the target port.

The Lock Protoco! Sequence an Example

This section explains the PCI Express lock protocol by example. The example includes the following devices:

The Root Complex that initiates the Locked transaction series on behalf of the host processor.

A Switch in the path between the root port and targeted legacy endpoint.

A PCI Express-to-PCI Bridge in the path to the target.

The target PCI device who's Device Driver initiated the locked RMW.

A PCI Express endpoint is included to describe Switch behavior during lock.

In this example, the locked operation completes normally. The steps that occur during the operation are described in the two sections that follow.

The Memory Read Lock Operation

Figure E-1 on page 1037 illustrates the first step in the Locked transaction series (i.e., the initial memory read to obtain the semaphore):

The CPU initiates the locked sequence (a Locked Memory Read) as a result of a driver executing a locked RMW instruction that targets a PCI target.

The Root Port issues a Memory Read Lock Request from port 2. The Root Complex is always the source of a locked sequence.

The Switch receives the lock request on its upstream port and forwards the request to the target egress port (3). The switch, upon forwarding the request to the egress port, must block all requests from ports other than the ingress port (1) from being sent from the egress port.

A subsequent peer-to-peer transfer from the illustrated PCI Express endpoint to the PCI bus (switch port 2 to switch port 3) would be blocked until the lock is cleared. Note that the lock is not yet established in the other direction. Transactions from the PCI Express endpoint could be sent to the Root Complex.

The Memory Read Lock Request is sent from the Switch's egress port to the PCI Express-to-PCI Bridge. This bridge will implement PCI lock semantics (See the MindShare book entitled PCI System Architecture, Fourth Edition, for details regarding PCI lock).

The bridge performs the Memory Read transaction on the PCI bus with the PCI LOCK# signal asserted. The target memory device returns the requested semaphore data to the bridge.

Read data is returned to the Bridge and is delivered back to the Switch via a Memory Read Lock Completion with Data (CplDLk).

The switch uses ID routing to return the packet upstream towards the host processor. When the CplDLk packet is forwarded to the upstream port of the Switch, it establishes a lock in the upstream direction to prevent traffic from other ports from being routed upstream. The PCI Express endpoint is completely blocked from sending any transaction to the Switch ports via the path of the locked operation. Note that transfers between Switch ports not involved in the locked operation would be permitted (not shown in this example).

Upon detecting the CplDLk packet, the Root Complex knows that the lock has been established along the path between it and the target device, and the completion data is sent to the CPU.

Figure E-1: Lock Sequence Begins with Memory Read Lock Request

Read Data Modified and Written to Target and Lock Com- pletes

The device driver receives the semaphore value, alters it, and then initiates a memory write to update the semaphore within the memory of the legacy PCI device. Figure E-2 on page 1038 illustrates the write sequence followed by the Root Complex's transmission of the Unlock message that releases the lock:

The Root Complex issues the Memory Write Request across the locked path to the target device.

The Switch forwards the transaction to the target egress port (3). The memory address of the Memory Write must be the same as the initial Memory Read request.

PCI Express System Architecture

The bridge forwards the transaction to the PCI bus.

The target device receives the memory write data.

Once the Memory Write transaction is sent from the Root Complex, it sends an Unlock message to instruct the Switches and any PCI/PCI-X bridges in the locked path to release the lock. Note that the Root Complex presumes the operation has completed normally (because memory writes are posted and no Completion is returned to verify success).

The Switch receives the Unlock message, unlocks its ports and forwards the message to the egress port that was locked to notify any other Switches and/or bridges in the locked path that the lock must be cleared.

Upon detecting the Unlock message, the bridge must also release the lock on the PCI bus.

Figure E-2: Lock Completes with Memory Write Followed by Unlock Message

Notification of an Unsuccessful Lock

A locked transaction series is aborted when the initial Memory Read Lock Request receives a Completion packet with no data (CplLk). This means that the locked sequence must terminate because no data was returned. This could result from an error associated with the memory read transaction, or perhaps the target device is busy and cannot respond at this time.

Summary of Locking Rules

Following is a list of ordering rules that apply to the Root Complex, Switches, and Bridges.

Rules Related To the Initiation and Propagation of Locked Transactions

Locked Requests which are completed with a status other than Successful Completion do not establish lock.

Regardless of the status of any of the Completions associated with a locked sequence, all locked sequences and attempted locked sequences must be terminated by the transmission of an Unlock Message.

MRdLk, CpIDLk and Unlock semantics are allowed only for the default Traffic Class (TC0).

Only one locked transaction sequence attempt may be in progress at a given time within a single hierarchy domain.

Any device which is not involved in the locked sequence must ignore the Unlock Message.

The initiation and propagation of a locked transaction sequence through the PCI Express fabric is performed as follows:

A locked transaction sequence is started with a MRdLk Request:

Any successive reads associated with the locked transaction sequence must also use MRdLk Requests.

The Completions for any successful MRdLk Request use the CplDLk Completion type, or the CPILk Completion type for unsuccessful Requests. PCI Express System Architecture

If any read associated with a locked sequence is completed unsuccessfully, the Requester must assume that the atomicity of the lock is no longer assured, and that the path between the Requester and Completer is no longer locked.

All writes associated with a locked sequence must use MWr Requests.

The Unlock Message is used to indicate the end of a locked sequence. A Switch propagates Unlock Messages through the locked Egress Port.

Upon receiving an Unlock Message, a legacy Endpoint or Bridge must unlock itself if it is in a locked state. If it is not locked, or if the Receiver is a PCI Express Endpoint or Bridge which does not support lock, the Unlock Message is ignored and discarded.

Rules Related to Switches

Switches must detect transactions associated with locked sequences from other transactions to prevent other transactions from interfering with the lock and potentially causing deadlock. The following rules cover how this is done. Note that locked accesses are limited to TC0, which is always mapped to VC0.

When a Switch propagates a MRdLk Request from an Ingress Port to the Egress Port, it must block all Requests which map to the default Virtual Channel (VC0) from being propagated to the Egress Port. If a subsequent MRdLk Request is received at this Ingress Port addressing a different Egress Port, the behavior of the Switch is undefined. Note that this sort of split-lock access is not supported by PCI Express and software must not cause such a locked access. System deadlock may result from such accesses.

When the CplDLk for the first MRdLk Request is returned, if the Completion indicates a Successful Completion status, the Switch must block all Requests from all other Ports from being propagated to either of the Ports involved in the locked access, except for Requests which map to channels other than VC0 on the Egress Port.

The two Ports involved in the locked sequence must remain blocked until the Switch receives the Unlock Message (at the Ingress Port which received the initial MRdLk Request)

The Unlock Message must be forwarded to the locked Egress Port.

The Unlock Message may be broadcast to all other Ports.

The Ingress Port is unblocked once the Unlock Message arrives, and the Egress Port(s) which were blocked are unblocked following the transmission of the Unlock Message out of the Egress Port(s). Ports that were not involved in the locked access are unaffected by the Unlock Message

Rules Related To PCI Express/PCI Bridges

The requirements for PCI Express/PCI Bridges are similar to those for Switches, except that, because these Bridges only use TC0 and VC0, all other traffic is blocked during the locked access. The requirements on the PCI bus side are described in the MindShare book entitled PCI System Architecture, Fourth Edition (published by Addison-Wesley).

Rules Related To the Root Complex

A Root Complex is permitted to support locked transactions as a Requester. If locked transactions are supported, a Root Complex must follow the rules already described to perform a locked access. The mechanism(s) used by the Root Complex to interface to the host processor's FSB (Front-Side Bus) are outside the scope of the spec.

Rules Related To Legacy Endpoints

Legacy Endpoints are permitted to support locked accesses, although their use is discouraged. If locked accesses are supported, legacy Endpoints must handle them as follows:

The legacy Endpoint becomes locked when it transmits the first Completion for the first read request of the locked transaction series access with a Successful Completion status:

If the completion status is not Successful Completion, the legacy Endpoint does not become locked.

Once locked, the legacy Endpoint must remain locked until it receives the Unlock Message.

While locked, a legacy Endpoint must not issue any Requests using Traffic Classes which map to the default Virtual Channel (VC0). Note that this requirement applies to all possible sources of Requests within the Endpoint, in the case where there is more than one possible source of Requests. Requests may be issued using Traffic Classes which map to VCs other than VC0.

Numerics

12x Packet Format 413

1x Packet Format 412

4x Packet Format 412

8b/10b Decoder 402

8b/10b Encoder 400, 424

A

ACK 211

ACK DLLP 91, 92, 202, 219

ACK/NAK Latency 217, 237

ACK/NAK Protocol 90, 211, 212, 220

ACKD_SEQ Count 214

ACKNAK_Latency_Timer 217, 237

ACPI 577

ACPI Driver 570, 579

ACPI Machine Language 578, 580

ACPI Source Language 578, 580

ACPI spec 569

ACPI tables 577

Active State Power Management 46, 87, 403, 608 Advanced Configuration and Power Interface 569, 577

Advanced Correctable Error Reporting 385

Advanced Correctable Error Status 385

Advanced Correctable Errors 384

Advanced Error Capabilities and Control Register 935

Advanced Error Correctable Error Mask Register 935

Advanced Error Correctable Error Status Register 936

Advanced Error Reporting 382

Advanced Error Reporting Capability Register Set 931

Advanced Error Root Error Command Register 938

Advanced Error Root Error Status Register 938

Advanced Error Uncorrectable and Uncorrectable Error Source ID Register 938

Advanced Error Uncorrectable Error Mask Register 936

Advanced Error Uncorrectable Error Severity Register 937

Advanced Error Uncorrectable Error Status Register 937

Advanced Source ID Register 391

Advanced Uncorrectable Error Handling 386

Advanced Uncorrectable Error Status 387

AGP Capability 845

AGP Command Register 846

AGP Command register 846

AGP Status and AGP Command registers 845

AGP Status Register 845

AGP Status register 845

AML 578, 580

AML token interpreter 578

APIC 16, 25, 353

ASL 578, 580

ASPM 568

ASPM Exit Latency 628

Assert_INTx messages 348

Assigning VC Numbers 260

Async Notice of Slot Status Change 683

Attention Button Pressed Message 679

Attention Indicator 657, 664

Attention_Indicator_Blink Message 679

Attention_Indicator_Off Message 679

Attention_Indicator_On Message 679

Aux_Current field 598

B

BARs 793

Base Address Registers 792, 811

Beacon 469, 497, 642, 643

BER 455, 466

BIOS 577, 656, 886, 890

BIST 778

BIST register 778

Bit Error Rate 455

Bit Lock 94, 440, 441, 465

Bridge Control Register 835

Built-In Self-Test 778

Bus Enumerator 890

Bus Master 21, 833

Bus Number register 726, 805

Byte Count Modified 188

byte merging 801

Byte Striping 408

Byte Striping logic 400

Capabilities List bit 336, 585, 779, 837, 840

Capabilities Pointer register 585, 779, 780

Capability ID 332, 585, 780, 859

Card Connector Power Switching Logic 657

Card Information Structure 782

Card Insertion 658

Card Insertion Procedure 661

Card Present 657

Card Removal 658

Card Removal Procedure 659

Card Reset Logic 657

Cardbus 770, 777, 782

Character 72, 77, 400

Characters 405

Chassis and Slot Number Assignment 861

Chassis Number 860

Chassis, Expansion 862

Chassis, main 862

Chassis/Slot Numbering Registers 859, 863

CIS 782

Class Code 775, 875, 876, 882, 884, 1019

class code 0 1020

class code 1 1020

class code 10h 1030

class code 11h 1031

class code 2 1021

class code 3 1022

class code 4 1022

class code 5 1022

class code 6 1023

class code 7 1024

class code 8 1026

class code 9 1027

class code A 1027

class code B 1028

class code

C 1028

class code D 1029

class code E 1030

class code F 1030

Class Code register 774

Class driver 570, 774

code image 875, 878

Code Type 883, 885

Cold Reset 95, 488

Collapsing INTx Signals 349

Command register 832

company ID 953

Completer 37, 49, 50

Completer Abort 366

Completion 160

Completion Packet 184

Completion Status 187, 371

Completion Time-out 367

Completion W/Data 160

Completion-Locked 160

Completions 183

Config Type 0 Read Request 160

Config Type 0 Write Request 160

Config Type 1 Read Request 160

Config Type 1 Write Request 160

Configuration Address Port 724, 725, 726

Configuration Command Register 373

Configuration Data Port 724, 725

Configuration Request Packet 180

Configuration Requests 179

Configuration Space Layout 895

Configuration Status Register 374

Control Character Encoding 430

Control Method 578, 579

Correctable Errors 369

CRD 423

Credit Allocated Count 291

CREDIT_ALLOCATED 292

Credits Received Counter 291

CREDITS_CONSUMED 292

CREDITS_RECEIVED 291

Current Running Disparity 423

Cut-Through 102, 248

D characters 405

D0 573, 576, 586

D0 Active 587

D0 Uninitialized 586

D1 574, 576, 587

D1_Support bit 598

D2 574, 576, 589

D2_Support bit 597

D3 574, 576, 590

D3cold 592

D3hot 591

Data Link Layer Packet 71, 74

Data Poisoning 362

Data Register 603

Data_Scale field 601

Data_Select field 602

DDIM 887

Deassert_INTx messages 348

decoders 792

De-emphasis 455, 466

Default Device Class Power Management spec 576

Definition of On and Off 658

De-Scrambler 402

Device Capabilities Register 900

Device Class Power Management specs 576

Device Context 574

Device Control Register 905

Device Driver 656, 774, 791, 844, 872, 888,

891, 905

Device Driver Initialization Model 887

Device ID 773, 876, 882, 883

Device PM States 573, 586

Device ROM 783, 872

Device Serial Number Capability 952

Device Status Register 378, 909

Device-Specific Initialization (DSI) bit 599

Differential Receiver 439

Digest 166

Discard Timer SERR# Enable 837

Discard Timer Status 837

Discard unused prefetch data 801

Disparity 423

DLLP 71, 74, 75, 111, 154, 198, 201

Downstream 805

Downstream Port 50

Driver 681

DSI bit 599

Dual Simplex 41, 399

E

ECRC 166, 167

ECRC Generation and Checking 361, 383

EDB 412

Egress Port 44, 50

EISA 724

Elastic Buffer 402

Electrical Idle 41, 77, 108, 109, 432, 434, 454,

464

Enabling Error Reporting 377

END 412

End Tag descriptor 851

Endpoint 44, 48, 49, 51, 55

End-to-End CRC 166

Error Classifications 368

Error Handling 393

Error Handling Mechanisms 360

Error Logging 389

Error Messages 370

Error Reporting Mechanisms 359

Error Severity 388

EUI-64 953

Expansion ROM 872

Expansion ROM Base Address Register 783, 811,

872

Expansion ROM Enable bit 784

Expansion Slot 860

Extension ID 953

F

Fast Back-to-Back Enable 834, 836

FC Initialization Sequence 305

Fcode device driver 889

Fcode interpreter 889

First DW Byte Enables 164, 167

First-In-Chassis bit 864

Flag 317

Flow Control Buffer Size (max) 297

Flow Control Buffers 288

Flow Control Credits 286, 289

Flow Control Elements 290, 295

Flow Control Initialization 294, 304

Flow Control Packet Format 205

Flow Control Packets 293

Flow Control Update Frequency 310

Flow Control Updates 308

Forth 889

Framing Symbols 156, 400

FTS 109, 434

Function PM State Transitions 593

Function State Transition Delays 596

Fundamental Reset 95, 487, 488

General Purpose Event 579

GPE 579

GPE handler 579

H

Hardware Fixed VC Arbitration 269

Hardware-Fixed Port Arbitration 278

Header space 779

Header Type One 777

Header Type register 777

Header Type Two 777

Header Type Zero 777

Header Type/Format Field 165

Hierarchy 49

Hierarchy Domain 49

Host/PCI bridge 727

Hot Plug Elements 655

Hot Plug Messages 197

Hot Reset 95, 487, 491

Hot-Plug Controller 656

Hot-Plug primitives 682

Hot-Plug Service 655

Hot-Plug System Driver 655

Hub Link 32, 33, 35, 51

IDE 774, 872, 1031

Identifier String descriptor 851

IEEE 953

IEEE 1394 Bus Driver 577

IEEE standard 1275-1994 888

In-band Reset 491

Indicator Byte 883, 885

Infinite Flow Control Credits 301

Ingress Port 44, 50

InitFC1-P DLLP 201

Initial Program Load 872

Initialization code 885

Initialization code image 876

Initiator 118

input device 872

Interrupt Disable 346

Interrupt Latency 341

interrupt latency 341

Interrupt Line Register 345, 791

Interrupt Pin Register 343, 792

Interrupt Service Routine 886

Interrupt Status 346

Interrupt-Related Registers 844

Inter-symbol Interference 466, 467

INTx Message 193

INTx Message Format 351

INTx# Pins 342

INTx# Signaling 345

IO Base Address Register 797

IO Base and IO Limit registers 812

IO Decoder 797

IO decoder, Legacy 798

IO Extension registers 812

IO Read Request 160

IO Request Packet 172, 579

IO Requests 66, 171

IO Write Request 160

IPL 872

IRP 579

ISA Enable bit 836

ISA Plug-and-Play specification 887

Isochronous Transactions 252

K

K character 405

keywords 851, 853

L0 State 46, 403, 482

L0s State 611

L1 ASPM 606, 609, 614

L1 ASPM Negotiation 616

L1 State 629

L2 State 637

L2/L3 Ready state 633, 634

Lane 94, 95, 400, 408, 411, 415, 444

Lane Reversal 95

Last DW Byte Enables 165, 167

Latency Timer Registers 843

LCRC 72, 213, 216, 221

Legacy Endpoint 49, 330, 332, 335, 352

Link

13, 14, 41, 94, 101

Link Capabilities Register 609, 912

Link Control Register 915

Link Errors 379

Link Flow Control-Related Errors 363

Link Power Management 606

Link Status Register 918

Link Training and Initialization 94, 403, 496

Link Width

14, 41, 94, 913

Low-priority VC Arbitration 267

LTSSM 213

Malformed TLP 364

Master Abort Mode 836

MCH 28, 33

Memory Base Address Register 794

Memory Base and Limit registers 823, 830

Memory Read Lock Request 160

Memory Read Request 160

Memory Request Packet 175

Memory Requests 64, 68, 174

Memory Space bit 784

Memory Write and Invalidate Enable 834

Memory Write Request 160

Memory-Mapped IO 793, 823, 830

Message Address Register 335, 336

Message Control Register 333, 336

Message Data register 335, 336

Message Request Packet 190

Message Request W/Data 160

Message Requests

63, 160, 190

Message Signaled Interrupts 331

Miniport Driver 570

MSI 331, 791

MSI Capability Register 332

MSI Configuration 336

Multiple Message Capable field 336

Multiple Messages 339

N

NAK DLLP 87, 90, 202, 219

NAK Scheduling 236

NAK_SCHEDULED Flag 217, 233

Namespace 577

New Capabilities list 837

Next Capability Pointer 859

NEXT_RCV_SEQ 203, 216, 219, 230

Non-Prefetchable Memory 796

North Bridge 16, 23, 29

Nullified Packet 384, 431

Number of Expansion Slots 864

OnNow Design Initiative 571

Open Firmware 888

OpenBoot 885, 888

Order Management 324

Ordered-Sets 405

Ordering Rules Summary 327

OS boot process 888

Output device 872

P

PA/RISC executable code 885

Parity Error Response 834, 836

Pause command 656, 681

Pausing a Driver 681

PCI Bus Driver 570, 571, 577

PCI Bus PM Interface Specification 569

PCI Data Structure 880

PCI Express Capability ID Register 898

PCI Express Capability Register Set 897

PCI Express Endpoint 49

PCI Interrupt Signaling 342

PCI PM 569

PCI power management 557, 567, 649

PCI-Compatible Error Reporting 372

PCI-to-PCI Address Decode-Related Registers 809

PCI-to-PCI bridge 727, 770, 777

PCI-to-PCI bridge terminology 805

Physical Slot ID 681

PM Capabilities (PMC) Register 597

PM Capability Registers 585

PM Control/Status (PMCSR) Register 599

PM Event (PME) Context 575

PM Registers 596

PM_Active_State_Request_L1 201

PM_Enter_L1 DLLP 201

PM_Enter_L23 201

PM_Request_Ack 201

PMC Register 597

PMCSR Register 599

PME Clock bit 599

PME Context 575

PME# 575

PME_En bit 602

PME_Status bit 601

PME_Support field 597

Polarity Inversion 94, 95

Port 42, 44, 50

Port Arbitration 45, 84, 85, 86, 263, 274, 277,

939

Port Arbitration Table 276, 280, 952

Port VC Capability Register 1 941

Port VC Control Register 944

Port VC Status Register 945

POST 881

Power Budget Register Set 955

Power Budgeting Capability Register 956

Power Budgeting Data Register 956

Power Budgeting Enhanced Capability Header 955

Power Indicator 665

Power IRP 579

power management

557, 567, 649

Power Management DLLP Packet 204

Power Management Messages 194

Power Management Policy Owner 577

Power Management Register Set 586, 596

Power_Indicator_Blink Message 679

Power_Indicator_Off Message 679

Power_Indicator_On Message 679

PowerPC 727

PowerState field 603

Prefetchable Attribute bit 795, 824

Prefetchable Memory 801, 829

Prefetchable Memory Base and Limit registers 823

Primary bus 805

Primary Bus Number register 806

Primary Discard Timeout 837

Primitives, hot-plug 655, 682

Producer/Consumer Model 317

Programming Interface byte 774

Q

QoS 11, 80, 82, 83, 264

Query Hot-Plug System Driver 682

Query Slot Status 683

Quiesce 681

Quiesce command 656

Quiescing Card and Driver 681

R

RCRB 275, 957, 958

Read/Write VPD Keywords 856

Relaxed Ordering 319

Replay 88

Replay Buffer 88

Replay Timer 384

Requester 37, 49, 50

Resume command 656

Retention Latch 666

Retention Latch Sensor 666

Revision ID 773

ROM Data Structure 878, 881

ROM Detection 872

ROM Header 878, 879

ROM shadowing 875

Root Complex 42, 48, 49, 107, 131, 330, 352,

370, 390, 714, 718, 722, 727, 742, 753,

757, 761, 765

Root Complex Error Status 390

Root Complex Register Block 957

Root Control Register 926

Root Error Command Register 392

Root Status Register 928

Round Robin VC Arbitration 270

RST# 657

Run time code image 876

Rx Clock Recovery 440

SCI 579

Scrambler 400, 416

SCSI 872

SDP 411

Secondary Bus 805

Secondary Bus Number register 806

Secondary Bus Reset bit 836

Secondary Discard Timeout 837

Secondary Latency Timer Register 843

Secondary Status Register 840

Sequence Number 213, 216, 234

SERR# Enable 834, 836

Set Slot Status 683

Severity of Error 388

shadow RAM 875

SKIP 431, 432, 434

Slot Capabilities Register 670, 920

Slot Control 672

Slot Control Register 923

Slot Number Assignment 861

Slot Numbering Identification 668

Slot Numbering Registers 859, 863

Slot Power Limit Control 672

Slot Power Limit Message 196

Slot Status Register 925

Soft Off 573

South Bridge 16, 32

Special Cycle 834

Split Completion, bridge claiming of 807

Start command 656

Status Register (Primary Bus) 837

Stepping Control bit 834

Sticky Bits 383

STP 411

Strict Priority VC Arbitration 265

String Identifier descriptor 855

Strong Ordering 321

Sub Class 774

Subordinate bus 805

Subordinate Bus Number register 726, 807

Subsystem ID 776

Subsystem Vendor ID 776

Surprise Removal Notification 652

Switch

11, 42, 48, 50, 86, 282

Symbol 43, 93, 400, 421, 405

Symbol Lock 94, 405, 441

System Control Interrupt 579

System PM States 572

T

Target 20, 23, 24

TC 44

TC filtering 363

TC/VC Mapping 262

Time-Based, Weighted Round Robin Arbitration

279

TLP 55, 57, 71, 154, 156

Token 578, 580, 889

Traffic Class

44, 81, 87, 161, 164, 252, 256, 262

318, 321, 363

Training Sequence 1 405

Training Sequence 2 405

Transaction Descriptor 169

Transaction ID 169

Transaction Layer Packet 55

Transaction Types 113

Transactions 43

Translating Slot IDs 681

TS1 109, 405, 434

TS2 109, 405

TSIZ 727

Turning Slot On 659

Tx Buffer 404

Type 0 configuration transaction 727

Type 1 configuration transaction 727

UDF Supported bit 885

Uncorrectable Error Reporting 388

Uncorrectable Error Severity 388

Uncorrectable Fatal Errors 369

Uncorrectable Non-Fatal Errors 369

Unexpected Completion 367

Universal Device Driver 889

Universal/Local Bit 953

Unlock Message 196

Unsupported Request 365

Upstream 805

Upstream Port 50

USB Bus Driver 577

V

VC 44

VC Arbitration

44, 50, 84, 85, 86, 264, 267, 270

765, 939, 944

VC Arbitration Table 951

VC Resource Capability Register 946

VC Resource Control Register 948

VC Resource Status Register 950

Vendor ID 773, 876, 882, 883

VGA 774

VGA device ROM 875

VGA Enable bit 836

VGA Palette Snoop bit 834

Virtual Channel 4, 83, 256, 260, 263, 270, 286,

288, 323, 324

Vital Product Data 848, 882, 884

VPD 848, 882, 884

VPD Checksum 855

VPD data structure 857

VPD-R descriptor 851, 853

VPD-W descriptor 852, 856

WAKE# Signal 642, 696

Warm Reset 95, 488

WDM Device Driver 570, 577, 579

Weak Ordering 324

Weighted Round Robin Port Arbitration 279

Weighted Round Robin VC Arbitration 269

Windows 95/98/NT/2000 569

Windows Driver Model 579

Working state 572 PC Programming/Hardware “

P

C1 Express System Architecture is a high quality and comprehensive must-have reference for any engineer working with PCI Express. Highly recommended."

-David Churchill | Agrilent Technologies

PCI Express is the third-generation Peripheral Component Interconnect technology for a wide range of systems and peripheral devices. Incorporating recent advances in high-speed, point-to-point interconnects, PCI Express provides significantly higher performance, reliability, and enhanced capabilities - at a lower cost-than the previous PCI and PCI-X standards. Therefore, anyone working on next-generation PC systems, BIOS and device driver development, and peripheral device design will need to have a thorough understanding of PCI Express.

PCI Express System Architecture provides an in-depth description and comprehensive reference to the PCI Express standard. The book contains information needed for design, verification, and test, as well as background information essential for writing low-level BIOS and device drivers. In addition, it offers valuable insight into the technology's evolution and cutting-edge features.

Following an overview of the PCI Express architecture, the book moves on to cover transaction protocols, the physical/electrical layer, power management, configuration, and more. Specific topics covered include:

Split transaction protocol

Packet format and definition, including use of each field

ACK/NAK protocol

Traffic Class and Virtual Channel applications and use

Flow control initialization and operation

Error checking mechanisms and reporting options

Switch design issues

Advanced Power Management mechanisms and use

Active State Link power management

Hot Plug design and operation

Message transactions

Physical layer functions

Electrical signaling characteristics and issues

PCI Express enumeration procedures

Configuration register definitions

Thoughtfully organized, featuring a plethora of illustrations, and comprehensive in scope, PCI Express System Architecture is an essential resource for anyone working with this important technology.

MindShare's PC System Architecture Series is a crisply written and comprehensive set of guides to the most important PC hardware standards. Books in the series are intended for use by hardware and software designers, programmers, and support personnel.

MindShare, Inc., is one of the leading technical training companies in the hardware industry, providing innovative courses for dozens of companies, including IBM, HP, PLX, Sun, and Texas Instruments.

Ravi Budruk is a senior staff engineer and instructor with MindShare, Inc., where he has trained hundreds of engineers. He is an industry expert on such topics as Intel Processor and PC architecture, as well as such bus architectures as PCI Express, PCI, PCI-X, HyperTransport, IEEE 1394, and ISA. Before working at MindShare, Mr. Budruk was a PC chipset architect and designer at VLSI Technology, Inc.

Don Anderson is an expert on digital electronics and system design. He passes on his wealth of experience by training engineers, programmers, and technicians at MindShare, Inc., and is the author of numerous MindShare books.

Tom Shanley is President of MindShare, Inc., and one of the world's foremost authorities on computer system architecture.

www.informit.com/aw

www.mindshare.com

Cover design by Barhara T. Advisson

Coner photograph by Tassahiko Shimada/Photonics

Contact www.mindshare.com for Training on This Subject

Chapter 24: Express-Specific Configuration Registers

Port VC Capability Register 2

PCI Express System Architecture

Port VC Control Register

Port VC Status Register

PCI Express System Architecture

VC Resource Registers

Chapter 24: Express-Specific Configuration Registers

PCI Express System Architecture

PCI Express System Architecture

Chapter 24: Express-Specific Configuration Registers

VC Arbitration Table

PCI Express System Architecture

Port Arbitration Tables

Device Serial Number Capability

Chapter 24: Express-Specific Configuration Registers

PCI Express System Architecture

Power Budgeting Capability

General

Chapter 24: Express-Specific Configuration Registers

How It Works

PCI Express System Architecture

Chapter 24: Express-Specific Configuration Registers

RCRB

General

Firmware Gives OS Base Address of Each RCRB

Misaligned or Locked Accesses To an RCRB

Extended Capabilities in an RCRB

PCI Express System Architecture

The RCRB Missing Link

Appendix A Test, Debug and Verification of PCI Express TM Designs

Scope

Appendix A: Test, Debug and Verification

Electrical Testing at the Physical Layer

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Appendix A: Test, Debug and Verification

Link Layer Testing

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Transaction Layer Testing

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Appendix A: Test, Debug and Verification

PCI Express System Architecture

Summary

Appendix A: Test, Debug and Verification

Contact Agilent Technologies

Appendix B Markets & Applications for the PCI Express TM~ Architecture

Introduction

PCI Express System Architecture

Appendix B: Markets/Apps for PCI Express (by PLX)

Enterprise Computing Systems

Desktop Systems

Server Systems

Appendix B: Markets/Apps for PCI Express (by PLX)

Embedded Control

PCI Express System Architecture

Storage Systems

Appendix B: Markets/Apps for PCI Express (by PLX)

Communications Systems

Appendix B: Markets/Apps for PCI Express (by PLX)――

Summary

Appendix C

Introduction

Usage Models

Intelligent Adapters

Host Failover

Appendix D: Intelligent Adapters & Multi-Host Systems

Multiprocessor Systems

The History Multi-Processor Implementations Using PCI

Appendix A Test, Debug and Verification of PCI Express $^{TM}$ Designs

Appendix B Markets & Applications for the PCI Express $^{T}^{\tilde{M}}$ Architecture

$\underset{―}{\underset{―}{Appendix B: Markets/Apps for PCI Express (by PLX)}}$