Figure 24-30: Port VC Capability Register 1 (Read-Only)
Table 24 - 14: Port VC Capability Register 1 (Read-Only)
Bit(s)Description
2:0Extended VC Count. The number of additional VCs supported by the device. - 0= just VC0 is supported. - The maximum value is 7.
6:4Low Priority Extended VC Count. Indicates the number of VCs (starting with VC0) that comprise the Low-Priority VC (LPVC) group. - 0 . There is no LPVC group and the sequence in which the port's VC but ers transfer is governed by the fixed-priority scheme wherein VC0 has the lowest priority and the highest-numbered VC that is implemented has the highest priority. - Non-zero value ( n ). VCs 0-through- n are members of the LPVC group. The value specified cannot be greater than that specified in the Extended VC Count field of this register. - The VCs above n are members of the high-priority group where VCn+1 has the lowest priority and the highest VC has the highest pri ority. - Control passes to the LPVC group only when the VCs in the upper group have no packets to transfer. The priority scheme used among the VCs that are members of the lower group is governed by the VC Arbitration Capability field in Port VC Capability Register 2 (see “Port VC Capability Register 2” on page 943).

Chapter 24: Express-Specific Configuration Registers

Table 24 - 14: Port VC Capability Register 1 (Read-Only) (Continued)
Bit(s)Description
9:8Reference Clock. The reference clock for VCs that support time-based WRR Port Arbitration. This field is valid for RCRB and for Switch Ports an is not valid for Root Ports and Endpoint devices (must be hardwired to 0). - 00b=100ns reference clock. - 01b11b are reserved.
11:10Port Arbitration Table Entry Size. Indicates the size (in bits) of each entry in the device’s Port Arbitration table. This field is valid only for an RCRB | and for any Switch Port. It is hardwired to 0 for Endpoint devices and Root Ports. - 00b The size of each Port Arbitration table entry is 1 bit. - 01b The size of each Port Arbitration table entry is 2 bits. - 10b The size of each Port Arbitration table entry is 4 bits. - 11b The size of each Port Arbitration table entry is 8 bits.

Port VC Capability Register 2

The register is illustrated in Figure 24-31 on page 943 and each bit field is described in Table 24 - 15 on page 944.
Figure 24-31: Port VC Capability Register 2 (Read-Only)

PCI Express System Architecture

Table 24 - 15: Port VC Capability Register 2 (Read-Only)
Bit(s)Description
7:0VC Arbitration Capability. This bit mask indicates the arbitration scheme(s) supported by the device for the LPVC group. It is valid for a devices that report a Low Priority Extended VC Count greater than 0 (see the description in Table 24 - 14 on page 942). Each bit corresponds to an arbitration scheme defined below. When more than one bit is set, it indi cates that the Port can be configured to provide different VC arbitration se vices. - Bit 0: Hardwired, fixed arbitration scheme (e.g., Round Robin) - Bit 1: Weighted Round Robin (WRR) arbitration with 32 phases. - Bit 2: WRR arbitration with 64 phases. - Bit 3: WRR arbitration with 128 phases. - Bits 4-7: Reserved. The desired arbitration scheme is selected via the VC Arbitration Select field in the Port VC Control Register (see Table 24 - 16 on page 94
31:24VC Arbitration Table Offset. Indicates the location of the VC Arbitration Table with reference to the start of the VC capability register set (specified in increments of dqwords—16 bytes). A value of 0 indicates that the table i not present.

Port VC Control Register

The register is illustrated in Figure 24-32 on page 944 and each bit field is described in Table 24 - 16 on page 945.
Figure 24-32: Port VC Control Register (Read-Write)
Table 24 - 16: Port VC Control Register (Read-Write)
Bit(s)Description
0Load VC Arbitration Table. In order to activate a port’s VC Arbitration Table, the configuration software takes the following steps: 1. When software initially programs the VC Arbitration Table, or when any change is subsequently made to any entry in the table, the VC Arbitra tion Table Status bit in the Port VC Status register is automatically set to one by hardware. 2. Software then sets the Load VC Arbitration Table bit to one, causing the port to read the VC Arbitration Table from the capability register set and apply it. 3. When the port hardware has completed reading and applying the updated table, it automatically clears the VC Arbitration Table Status bit in the Port VC Status register. 1. Software can determine if the updated table has been applied by reading the state of the VC Arbitration Table Status bit in the Port VC Status reg ister. - 0 indicates the updated table has been read and applied. - 1 indicates that the update is not yet complete. This bit is valid for a device when the selected VC Arbitration type (see th next row in this table) uses the VC Arbitration Table. Clearing this bit has no effect. This bit always returns 0 when read.
3:1VC Arbitration Select. The configuration software selects one of the sup- ported LPVC arbitration schemes by setting it to the BCD value of the bi corresponding to the desired scheme (see the description of bits 7:0 in Table 24 - 15 on page 944). The configuration software must select the arb tration scheme prior to enabling more than one VC in the LPVC group.

Port VC Status Register

The register is illustrated in Figure 24-33 on page 946 and each bit field is described in Table 24 - 17 on page 946.

PCI Express System Architecture

Figure 24-33: Port VC Status Register (Read-Only)
Table 24 - 17: Port VC Status Register (Read-Only)
Bit(s)Description
0VC Arbitration Table Status. See the description of the Load VC Arbitra- tion Table bit in Table 24 - 16 on page 945

VC Resource Registers

General. At a minimum, each port implements a single VC, VC0, and it may optionally implement up to eight VCs, VC0-through-VC7. For each VC it supports, the port implements the following three registers:
  • VC Resource Capability register.
  • VC Resource Control register.
  • VC Resource Status register.
The following three sections provide a description of each of these registers.
Each VC implements:
  • A mandatory TC/VC bit map that defines the TCs that should be accepted into this VC.
  • An optional Port Arbitration Table that defines the order in which the VC accepts packets from the device ingress ports that source packets to it for transmission.
VC Resource Capability Register. The register is illustrated in Figure 24-34 on page 947 and each bit field is described in Table 24 - 18 on page 947.

Chapter 24: Express-Specific Configuration Registers

Figure 24-34: VC Resource Capability Register
Table 24 - 18: VC Resource Capability Register
Bit(s)TypeDescription
7:0ROPort Arbitration Capability. This bit mask indicates the types of Port arbitration (one or more) supported by the VC. It is valid for all Switch Ports and an RCRB, but not for PCI Express Endpoint devices or Root Ports. Software selects one of these arbitration schemes by writing to the Port Arbitration Select field in the VC Resource Control register (see "VC Resource Control Register" on page 948) - Bit 0. Hardwired, fixed arbitration scheme (e.g., Round Robin). - Bit 1. Weighted Round Robin (WRR) arbitration with 32 phases - Bit 2. WRR arbitration with 64 phases. - Bit 3. WRR arbitration with 128 phases. - Bit 4. Time-based WRR with 128 phases. - Bit 5. WRR arbitration with 256 phases. - Bits 6-7. Reserved.
14ROAdvanced Packet Switching. - 1 = This VC only supports transactions optimized for Advanced Packet Switching (AS). This bit is valid for all PCI Express Ports and RCRB. ’ 0 = The VC is capable of supporting all transactions defined by the spec (including AS transport packets).

PCI Express System Architecture

Table 24 - 18: VC Resource Capability Register (Continued)
Bit(s)TypeDescription
15HwInitReject Snoop Transactions. - 0= Transactions with or without the No Snoop bit set are allowed on this VC. - 1= Transactions with No Snoop =0 are rejected as an Unsup- ported Request. This bit is valid for Root Ports and RCRB, but not for Endpoint devices or Switch ports.
22:16HwInitMaximum Time Slots. Max time slots (minus one) that the VC supports when configured for time-based WRR port arbitration This field is valid for all Switch ports, Root Ports and an RCRB, but not for Endpoint devices. Only valid when the Port Arbitra- tion Capability field in this register indicates that the VC supports time-based WRR port arbitration.
31:24ROPort Arbitration Table Offset. Indicates the location of the Port Arbitration Table associated with this VC with reference to the start of the VC capability register set (specified in increments o dqwords—16 bytes). A value of 0 indicates that the table is not present. This field is valid for all Switch ports and an RCRB, but not for Endpoint devices or Root Ports.
VC Resource Control Register. The register is illustrated in Figure 24- 35 on page 948 and each bit field is described in Table 24 -19 on page 949.
Figure 24-35: VC Resource Control Register (Read-Write)
Table 24 - 19: VC Resource Control Register (Read-Write)
Bit(s)Description
7:0TC/VC Map. TC-to-VC mapping bit map. Each bit within this field corre- sponds to a TC that is mapped to this VC. Multiple bits may be set to one Bit 1 = TC7 is mapped to this VC. 61=TC6 is mapped to this VC. 51=TC5 is mapped to this VC. 41=TC4 is mapped to this VC. 3 1= TC3 is mapped to this VC. 21=TC2 is mapped to this VC. 11=TC1 is mapped to this VC. 1=TC0 is mapped to this VC. This bit is read-only. 1 for VC0 and 0 for all other enabled VCs. Before removing one or more TCs from the TC/VC Map of an enabled VC, | software must ensure that no new or outstanding transactions with those labels are targeted at the given Link. The default value = FFh for VC0 and =00h for other VCs
16Load Port Arbitration Table. In order to activate a VC’s Port Arbitration Table the configuration software takes the following steps: 1. When software initially programs the VC’s Port Arbitration Table, or when any change is subsequently made to any entry in the table, the Port Arbit tion Table Status bit in the VC’s VC Resource Status register (see “VC Resource Status Register" on page 950) is automatically set to on 2. Software then sets the Load Port Arbitration Table bit to one, causing the VC to read the updated Port Arbitration Table from the capability register set and apply it. 3. When the VC hardware has completed reading and applying the updated table, it automatically clears the Port Arbitration Table Status bit in its VC Resource Status register. 4. Software can determine if the updated table has been applied by reading th state of the Port Arbitration Table Status bit in the VC’s VC Status register. – 0 indicates the updated table has been read and applied. – 1 indicates that the update is not yet complete. This bit is valid for a device when the selected Port Arbitration type (the next row in this table) uses the Port Arbitration Table. Clearing this bit has no effect This bit always returns 0 when read. This bit is valid for all Switch Ports and an RCRB, but not for Endpoint devices or Root Ports. This bit always returns when read and the default value of this bit is 0.

PCI Express System Architecture

Table 24 - 19: VC Resource Control Register (Read-Write) (Continued)
Bit(s)Description
19:17Port Arbitration Select. The configuration software selects one of the sup- ported port arbitration schemes by setting it to the BCD value of the bit corr sponding to the desired scheme (see the description of bits 7:0 in Table 24 - 1 on page 947). The configuration software must select the arbitration scheme prior to enabling more than one VC in the LPVC group.
26:24VC ID. This field assigns a VC ID (between 0 and 7) to the VC (for VC0, it is hardwired to zero). It cannot be modified if the VC has already been enabled
31VC Enable. - 1=VC enabled. - 0=VC disabled. The state of this bit is qualified by the state of the VC Negotiation Pending bi (in the VC’s VC Resource Status register; see “VC Resource Status Register” or page 950). - 0= negotiation has been completed (Flow Control initialization is com- pleted for the PCI Express Port) and the VC Enable bit indicates the state o the VC. - 1= the negotiation process has not yet completed and the state of the VC Enable bit therefore remains indeterminate This bit is hardwired to 1 for VC0. It is read/write for the other VCs and it default is 0. To enable a VC, its VC Enable bit must be set to one in the ports at both ends of the link. To disable a VC, its VC Enable bit must be cleared to zero in the ports at both ends of the link. Before disabling a VC, software must ensure that no traffic is using the VC. Prior to re-enabling a VC, software must first fully disable the VC in both com ponents on the Link.
VC Resource Status Register. The register is illustrated in Figure 24-36 on page 951 and each bit field is described in Table 24 - 20 on page 951.

Chapter 24: Express-Specific Configuration Registers

Figure 24-36: VC Resource Status Register (Read-Only)
Table 24 - 20: VC Resource Status Register (Read-Only)
Bit(s)Description
0Port Arbitration Table Status. See the description of the Load Port Arbitra tion Table bit in Table 24 - 19 on page 949. The default value of this bit is 0.
1VC Negotiation Pending. Indicates whether the VC negotiation process (initialization or disabling) is in the pending state. When this bit is set by hardware, it indicates that the VC is still in the process of negotiation. It is cleared by hardware after the VC negotiation completes. For VCs other tha | VC0, software uses this bit to enable or disable the VC. For VC0, this bit indicates the status of the Flow Control initialization process. Before using i VC, software must check whether the VC Negotiation Pending bit is cleared in the components at both ends of the Link.

VC Arbitration Table

A port implements a VC Arbitration Table if both of the following are true:
  • The Port supports more than one VC.
  • The Port implements a WRR arbitration scheme.
The table consists of a set of read/write registers and is only used if the configuration software selects (via the VC Arbitration Select field in Table 24 - 15 on page 944) one of the implemented WRR VC arbitration schemes (see VC Arbitration Capability in Table 24 - 15 on page 944).

PCI Express System Architecture

The configuration software configures the table with the arbitration scheme that the egress port logic uses to service the VC transmit buffers associated with the port. See the description of the Load VC Arbitration Table bit in Table 24 - 16 on page 945 for a description of how the table is uploaded into the port's logic. For a detailed description of the VC Arbitration Table, refer to "Loading the Virtual Channel Arbitration Table" on page 270.

Port Arbitration Tables

A VC implements a Port Arbitration Table if both of the following are true:
  • The Port supports more than one VC.
  • The VC implements a WRR arbitration scheme.
The table consists of a set of read/write registers and is only used if the configuration software selects (via the Port Arbitration Select field in Table 24 - 19 on page 949) one of the implemented WRR Port arbitration schemes (see Port Arbitration Capability in Table 24 - 18 on page 947).
The configuration software configures the table with the arbitration scheme that defines in what order the VC accepts packets being sourced from the ingress ports that have packets to be passed to this VC buffer on the egress port. See the description of the Load VC Arbitration Table bit in Table 24 - 16 on page 945 for a description of how the table is uploaded into the port's logic.
This register array is valid for all Switch Ports and RCRBs, but not for Endpoint devices or Root Ports. For a detailed description of the Port Arbitration Tables, refer to "The Port Arbitration Mechanisms" on page 277.

Device Serial Number Capability

This optional register set can be implemented on any PCI Express device in accordance with the following rules:
  • It consists of the Enhanced Capability Header pictured in Figure 24-37 on page 953 and the 64-bit Serial Number register pictured in Figure 24-38 on page 953.
  • The device serial number is a unique, read-only 64-bit value assigned to the device when it is manufactured.
  • A multifunction device with this feature only implements it on function 0 and other functions within the device must return the same serial number value as that reported by function 0 .
  • Any component (e.g., a Switch) that contains multiple devices must return

Chapter 24: Express-Specific Configuration Registers

the same serial number for each device within the component.
Figure 24-37: Device Serial Number Enhanced Capability Header
Figure 24-38: Device Serial Number Register
The serial number is also known as the EUI-64. Refer to Figure 24-39 on page 954. A portion of the Extended Unique Identifier (EUI)-64 is assigned by a registration authority operating under the auspices of the IEEE organization. The EUI-64 consists of:
  • 24-bit company ID value assigned by IEEE. Bit 6, the Universal/Local scope bit, is always set to one (Universal scope ID, not assigned to anything else in the universe) in the value assigned by the IEEE.
  • 40-bit extension ID assigned by the company that "owns" the assigned company ID. The interpretation of the company-assigned extension is outside the scope of the spec. As an example, it may represent the device ID and manufacturer-assigned serial number.

PCI Express System Architecture

Figure 24-39: EUI-64 Format

Power Budgeting Capability

General

Refer to Chapter 15, entitled "Power Budgeting," on page 557 for a detailed description of the Power Budgeting capability.
This optional capability permits the platform to properly allocate power to a device that is hot-plugged into the system during runtime. Using this register set, the device reports the following to the platform:
  • The power it consumes on a variety of power rails.
  • The power it consumes in different power management states.
  • The power it consumes under different operating conditions.
The platform (i.e., the system and the OS) uses this information to ensure that the system can provide the proper power and cooling levels to the device.
Implementation of this capability register set (see Figure 24-40 on page 955) is optional for devices that are implemented either in a form factor which does not require hot-plug support, or that are integrated on the system board. Although the spec states that "PCI Express form factor specifications may require support for power budgeting," it does not indicate any specific cases where this is required.
Figure 24-40 on page 955 illustrates the register set and Figure 24-41 on page 955 illustrates its Enhanced Capability Header register.

Chapter 24: Express-Specific Configuration Registers

How It Works

The power budgeting data for the function consists of a table of n entries starting with entry 0 . Each entry is read by placing an index value in the Power Budgeting Data Select register (Figure 24-40 on page 955) and then reading the value returned in the Power Budgeting Data register (Figure 24-42 on page 956).The end of table is indicated by a return value of all 0 's in the Data register.
In the Power Budgeting Capability register (see Figure 24-43 on page 956), the System Allocated bit is automatically set to one if the device is integrated onto the system board and its power requirements are therefore already taken into account in the system's power supply budget. In that case, the device's power requirements should be ignored by software in making power budgeting decisions.
Figure 24-40: Power Budget Register Set
Figure 24-41: Power Budgeting Enhanced Capability Header

PCI Express System Architecture

Figure 24-42: Power Budgeting Data Register
Figure 24-43: Power Budgeting Capability Register

Chapter 24: Express-Specific Configuration Registers

RCRB

General

As mentioned in "Root Complex Register Blocks (RCRBs)" on page 765, a Root Port may optionally implement a Root Complex Register Block (RCRB) as a 4KB block of memory-mapped IO registers that can include one or more of the optional PCI Express extended capabilities and other implementation specific registers that apply to the Root Complex. An RCRB must not reside in the same memory-mapped IO address space as that defined for normal PCI Express functions. Multiple Root Ports or internal devices may be associated with the same RCRB (see Figure 24-44 on page 958 for an example).

Firmware Gives OS Base Address of Each RCRB

The spec requires that the platform firmware must communicate the base address of the RCRB for each Root Port or internal device in the Root Complex to the OS. How this is accomplished is outside the scope of the spec.

Misaligned or Locked Accesses To an RCRB

A Root Complex is not required to support memory access requests to an RCRB that cross dword address boundaries or that are accomplished using a locked transaction series. Software should therefore not attempt an access of either type to an RCRB unless it is has device-specific knowledge that the Root Complex supports this access type.

Extended Capabilities in an RCRB

Any Extended Capability register sets in an RCRB must always begin at offset 0h within the RCRBs 4KB memory-mapped IO address space. If the RCRB does not implement any of the optional extended capability register sets, this is indicated by an Enhanced Capability header with a Capability ID of FFFFh and a Next Capability Offset of 0h .

PCI Express System Architecture

The RCRB Missing Link

The 1.0a version of the spec does not identify any method for discovering the existence of RCRBs that may reside within a Root Complex, nor does it identify any method for associating an RCRB with one or more Root Ports. As of the time of this writing (6/6/03) ,there is a draft ECN (Engineering Change Notice) to the 1.0a spec that has not yet been approved that addresses this issue. As soon as it is approved, MindShare will immediately include this information in classes taught by MindShare and, of course, this information will be provided in the Second Edition of this book. It is not included in this edition because draft changes have a habit of mutating before they reach their finalized, approved form.
Figure 24-44: RCRB Example
Appendices

Appendix A Test, Debug and Verification of PCI Express TM  Designs

by Gordon Getty, Agilent Technologies

Scope

The need for greater I/O bandwidth in the computer industry has caused designers to shift from using parallel buses like ISA, PCITM and PCI- XTM to using multi-lane serial interconnects running at Gigabit speed. The industry has settled on PCI Express TM technology as the key I/O technology of the future,as it delivers on the higher bandwidth requirements, helps to reduce cost for silicon vendors and leverages the software environment from the pervasive PCI/ PCI-X technology. While the change from parallel buses to multi-lane serial buses sounds like a small step, it presented a whole set of new debug and validation challenges to designers.
Serial technology requires a different approach to testing, starting from the physical layer and moving up through the transaction layer. In many cases, the parallel bus had several slots connected to the same physical lines, which allowed you to connect test equipment to the same bus and monitor other devices. With the point-to-point nature of serial technologies, this is no longer possible, and with the speed moving from the megahertz range to the gigahertz range, probing of the signal becomes a real challenge. PCI Express System Architecture
The second generation of PCI Express,known as PCI Express 2.0 (PCIe TM 2.0),is based on PCI Express 1.0 principles, but it supports speeds of up to 5 GT/s. Preserving backwards compatibility with PCI Express 1.0 presents its own set of challenges. Also, new and extended capabilities related to energy savings - including active state power management (ASPM) and dynamic link width negotiation - makes achieving interoperability between devices more challenging, especially if these features are implemented incorrectly. Careful design and validation processes can help you avoid costly chip re-spins to fix interoperabil-ity issues.
This chapter will guide you through overcoming the challenges faced when you debug and validate your PCI Express devices.

Appendix A: Test, Debug and Verification

Electrical Testing at the Physical Layer

PCI Express specification requires devices to have a built-in mechanism for testing the electrical characteristics of the devices, such as exists on motherboards and systems. When the transmit lanes of a device are terminated with a 50-ohm load, the transmit lanes are forced into a special mode known as compliance mode.
When a device is in compliance mode, it automatically generates a specific pattern known as the compliance pattern. Two different de-emphasis modes are introduced with the 5.0Gb/s transfer rate. All add-in cards should be tested at the 2.5Gb/s speed with 3.5dB de-emphasis (Figure A-1),and at 5.0Gb/s (Figure A-2) with both 3.5dB de-emphasis and 6dB de-emphasis.
Figure A-1: 2.5-GT/s PCIe Compliance Pattern

PCI Express System Architecture

Figure A-2: 5-GT/s PCIe Compliance Pattern
The equipment required to carry out electrical testing on PCIe 2.0 devices includes a high-performance oscilloscope such as the Agilent Technologies DSO81304B 13-GHz Infiniium scope and a board into which you can plug an add-in card to provide a load on its transmitters. Alternatively, you can use a load board that plugs into a system and forces its transmitters into compliance mode ensuring that the device is generating a measurable signal.
PCI Express specifications (V1.1 and later) requires you to capture and process one million unit intervals of data to be able to make a valid measurement. The Agilent 81304B scope has a "QuickMeas" (QM) function that provides user-defined macros and data capture functionality intended to meet needs that may be very specific to a given application or measurement.
The PCI-SIG® provides compliance base board and compliance load board to help accomplish these tasks. These boards provide a consistent platform to make electrical measurements. Figures A-3 and A-4 show a typical setup.

Appendix A: Test, Debug and Verification

Figure A-3: Typical Setup for Testing and Add-In Card
Figure A-4: Typical Setup for Testing a Motherboard

PCI Express System Architecture

With the setups shown in Figures A-3 and A-4, data is captured on the oscilloscope. Post-processing is used to measure jitter on the reference clock and to measure the random and deterministic jitter on the data lines. In electrical testing, you need to test each individual lane independently, as each lane is likely to have different electrical characteristics. The data is captured and then post-processed to form an eye diagram, such as the one shown in Figure A-5.
Figure A-5: Oscilloscope Eye Diagram
Using the eye diagram, you can measure the tolerances of voltage and jitter against the specification to determine if the device is compliant electrically. If you find the device is not compliant, you have an early indicator that interoper-ability is a potential issue.

Appendix A: Test, Debug and Verification

Link Layer Testing

Once you have determined that the electrical characteristics of your device are within the specification, your device should be able to successfully establish a link with another device. For example, you should be able to plug an add-in card into a system motherboard and have the devices negotiate and complete link training. If you have followed the specification for the link training and status state machine (LTSSM) design, the two devices will be able to negotiate the number of lanes and the speed of the link. For PCI Express 1.0, the only speed required is 2.5Gb/s ,whereas in PCI Express 2.0,speeds may go up to 5Gb/s speed. Speed negotiation occurs during link training.
Figure A-6 shows a test setup screen of a Agilent Protocol Exerciser, through which one can test the LTSSM.
Figure A-6: Testing the LTSSM with the Agilent N5309A Exerciser for PCIe 2.0
PCI Express has a more robust set of error checking mechanisms than conventional PCI. It is possible to recover from certain types of errors that may be caused by conditions such as a marginal signal quality at the electrical layer. There are cases where you will want to test the device against error conditions to ensure the robustness of your design and to ensure that your device will continue to operate correctly in a real operating environment. This is especially important for data integrity reasons. In conventional PCI, a simple parity check was applied to the data by using a dedicated line on the bus. Should a parity error occur, the system would typically signal it by asserting the PERR signal and potentially the SERR signal, which would most likely cause a "blue screen" scenario where the system would halt.
The data link layer (DLL) in PCI Express is designed to ensure that data transferred over the physical layer reaches its destination intact without errors. This is done using an acknowledge (ACK) or not acknowledge (NAK) message protocol. The DLL adds an LCRC (Link Cyclic Redundancy Check) code to the packet that is checked at the receiving end for integrity, and if the calculation turns out to be different, it is considered an error at the DLL. It potentially signals that the data being transferred was somehow changed and is now not valid. In conventional PCI, such an error would result in a system halt. In PCI Express, the correct behavior would be to send a NAK back to the originator, indicating that it should re-send the same packet since there was an error in the transmission of the packet the previous time it was sent. The originator would hold the packet in its replay buffer until it has received some kind of acknowledgement from the other end of the link.
PCI Express also has a message generation mechanism defined in the specification that allows the devices to report that an error has occurred but it has been noticed and fixed. A correctable error (ERR_COR) message is sent in this particular case and the relevant bit is set in the configuration space.
When a packet is sent, a timer is also started, and if no acknowledgement (ACK or NAK) is sent by the time this timer expires, the device re-sends the packet stored in the replay buffer. If this happens four times in a row, the DLL forces the physical layer to retrain the link, since there is potentially a problem that cannot be resolved without retraining or re-initializing the link.
It is very difficult to create the scenarios described above using real devices since the condition technically is not built into the design. However, to ensure your devices behave properly if such a condition arises, you can use a protocol test tool such as the Agilent E2960B exerciser for PCI Express 2.0. This stimulus tool has the capability to generate traffic and non-standard conditions and test that the results are correct.
The protocol analyzer gives a view of the link at the data link and transaction layers while also providing logical physical layer information in the form of the 8b/10b symbols transmitted on the wire (See Figure A-7). No electrical signal information is provided. The protocol analyzer has the intelligence to identify and trigger on complex conditions on the bus consisting of combinations of symbols in the form of packets and, at a higher level, transactions. The protocol analyzer is connected between the two devices on the link and observes real traffic between two devices. You can probe the link, either using an interposer card that fits into a slot or a mid-bus probe that uses a predefined layout on the PCB and brings the signals to the surface of the board where they can be probed. Other probing options, especially for embedded PCI Express applications, include flying leads, which can be used when no slot or midbus footprint is available.
The exerciser and protocol analyzer tools are quite different from physical-layer tools, such as the oscilloscopes discussed earlier, in that they operate under the assumption that the physical layer is working properly. However, physical-layer errors may show up at the data link layer and transaction layer in the form of CRC or disparity errors. These types of errors can be emulated by the exerciser because it has a programmable PCI Express interface. The PCI Express exerciser is actually a fully functioning PCI Express device that can behave either as a root complex or an end point and has additional deterministic behavior capabilities on the link, including the injection of errors. The exerciser is designed to establish a link based on parameters such as link speed, link width, and scrambling enabled or disabled, among others. In addition, particularly for PCIe 2.0,the de-emphasis level can be set to off, 3.5dB or 6dB .
The PCIe 2.0 specification introduced additional features. Similar features have already shown interoperability problems with existing PCI Express 1.0a and 1.1 devices. During link training, a PCIe 2.0 device advertises the speed capability in the training control register of the training sequence. The bit used to indicate this was previously a reserved bit in the 1.0a and 1.1 specifications. When some devices that implement PCIe 1.0a and 1.1 are plugged into a 5-Gb/s capable system, these reserved bits are not set to zero, and you may not be able to successfully establish a link. Using the Agilent exerciser, it is possible to easily test the behavior of a PCI 1.x device when the higher speed class is advertised.
PCIe 2.0 makes extensive use of the recovery state of the LTSSM, both to allow backwards compatibility with PCIe 1.0 and to allow the negotiation of the higher speed. Testing the LTSSM again provides assurance that devices will be able to link and operate properly in a real environment.

PCI Express System Architecture

Figure A-7: Representation of Traffic on a PCIe Link Using an Agilent Protocol Analyzer

Appendix A: Test, Debug and Verification

You would use an LTSSM test when one end of the link attempts to initiate a speed change to the higher speed and fails. For example, the LTSSM test can be used after exiting the Rec.Speed substate and one device remains at the lower speed while the other is at high speed. This scenario is easily tested using the Agilent exerciser in combination with the Agilent protocol analyzer.
A very important role of the DLL is to manage flow control. This is a credit-based mechanism that allows each end of the link to determine the size of the buffer for receiving packets and data at the other end of the link. Immediately after the link has trained, the flow control credits are initialized. Each device advertises the number of headers for posted, non-posted and completion packets in addition to the amount of data payload it can handle. When transaction layer packets are being sent back and forth, the DLL is responsible for sending periodic updates of the available credits to ensure no deadlock condition happens on the link. Flow control has a huge impact on the performance of the link. The exerciser can again be used to emulate different flow control scenarios, as it allows you to manually set flow control parameters from advertising no credits to advertising unlimited credits. The exerciser allows you to arbitrarily send out a DLLP, potentially with incorrect flow control data to test the behavior of the device.

PCI Express System Architecture

Transaction Layer Testing

The transaction layer in PCIe 2.0 is very similar to that of the 1.x specifications, which provides a great advantage in terms of software compatibility between the two specifications. The layered approach separates the physical connection from the upper-layer protocol. However, the new speed also presents different challenges related to performance at the DLL and transaction layers.
The protocol analyzer also allows you to measure characteristics such as actual throughput on the bus and transaction latencies. This information is extremely valuable for optimizing the performance of devices. The protocol analyzer is also a key instrument for finding errors and performing root cause analysis on error cases such as the one shown in Figure A-8. Unlike physical layer errors that may appear at the data link layer as disparity errors or CRC errors, errors at the transaction layer do not necessarily appear in the data link layer; they need to be dealt with by the receiver transaction layer. Figure A-9 shows an analyzer screen through which you can enable/disable error conditions to trigger on.
Figure A-8: Finding a Particular Condition or Sequence of Events Using a Trigger Sequencer

PCI Express System Architecture

Figure A-9: Typical PCI Express Error Conditions Triggerable on Agilent Protocol Analyzer
It is strategically important for you to know the performance of a device during the design stages. It is common, especially in the early stages of a technology, to have performance testing capabilities.
Stressing the system by generating back-to-back packets, using different packet lengths and types and injecting errors, is an ideal way to identify potential problems early that could eventually lead to costly redesigns. Stressing the system also helps you avoid competitive disadvantages.
Testing the system's data integrity is also critical for the success of a device. Running write/read/compare tests with known data and deterministic patterns over a long period of time allows you to test corner cases thoroughly in a short timeframe, giving you confidence that interoperability will not be an issue for the device.
The Agilent exerciser and analyzer are invaluable tools for evaluating and optimizing PCI Express designs. In the past, validation tools were limited. You could use off-the-shelf devices, test tools to change static link parameters and pattern generators. New test tools from Agilent integrate testing the high complexity and dynamic nature of PCI Express devices using one platform driven by a user-friendly GUI interface, as shown in Figure A-10.

Appendix A: Test, Debug and Verification

Figure A-10: Agilent PCIe 2.0 Exerciser Provides a Powerful and Flexible Validation Platform

PCI Express System Architecture

The exerciser is a standard-size PCI Express card, as shown in Figure A-11. It is in fact fully programmable from an external host connected through USB. You can control the behavior of the tool independently of the system it is plugged into.
Figure A-11: The Agilent PCIe 2.0 N5309A Exerciser

Appendix A: Test, Debug and Verification

Via a user-friendly GUI, it is possible to set up PCI Express traffic and have it downloaded via the USB interface to the PCI Express exerciser. The exerciser provides templates, as shown in Figure A-12, for creating different types of individual packets or requests that may contain multiple packets. Each of the requests has an associated behavior. This allows you to create traffic patterns via the exerciser deterministically and also create error cases that would not be possible in a real situation. The errors may occur, but it may be unpredictable as to when and why they happen.
Figure A-12: Templates for Creating Traffic Using the Exerciser

PCI Express System Architecture

The exerciser allows you to recreate error scenarios and perform root cause analysis on them. As shown in Figure A-13, it is possible to edit a packet and inject error conditions. The erroneous packet is then generated by the exerciser and error analysis performed with the aid of an Agilent Analyzer.
The exerciser will behave as a requester or master, and also as a target depending on how you set it up. You can also control the behavior of completions. For example, when the exerciser receives a request in the form of a memory read, the default response would be to respond with a successful completion. However, you can also program it to respond in other ways, including unsuccessful completion or a completer abort. Figure A-14 shows a completion packet editor via which you can inject errors into the completer generated completion packet. In addition, it is possible to have the exerciser delay sending the completion by a specific amount of time, which would facilitate the testing of the completion timeout mechanism on the requester.
Figure A-13: Adding Single or Multiple Error Scenarios to each Request
Edit Packetx
FieldValueLength
Priority: Automatic Tag: TLP Digest: LCRC: Disparity: Payload Size: TLP Poisoned: TLP Nullified: Replace STP: Replace END: Offset Sequence Number: OKCANCEI1Dec w团4 bits
Automatic Tag.1Hex福1 bit
Marked Absent〈0Hex国2 bits
Incorrectv1Hex到1 bit
Correct〈0Hex国1 bit
Correct、0Hex到1 bit
Disabled〈0Hex国1 bit
Enabled〈1Hex间1 bit
Disabled〈0Hex国1 bit
Disabled〈6Hex到1 bit
Disabled〈0Hex跑1 bit
Help

Appendix A: Test, Debug and Verification

Figure A-14: Programming Completer Behaviors
FieldValueLength
Completion Status:Unsupported Request ( 001Bin3 bits
Read Completion Boundary:1Dec v6 bits
Repeat:0Dec1 byte
Priority:Priority 1〈1Dec4 bits
TLP Digest:Absent〈0Hex1 bit
LCRC:Correct〈0Hex1 bit
Disparity:Correct〈0Hex1 bit
Payload Size:Correct、0Hex1 bit
TLP Poisoned:Disabled〈0Hex1 bit
TLP Nullified:Disabled〈0Hex1 bit
Replace STP:Disabled〈0Hex1 bit
Replace END:Disabled、0Hex1 bit
Offset Sequence Number:Disabled〈0Hex1 bit
Discard Completion:Disabled10Hex=1 bit
OKCancelHelp

PCI Express System Architecture

To create framing errors, you can replace the start-of-transaction-layer packet (STP) symbol and end of packet (END) symbols with other arbitrary values. Also, since the exerciser is a fully programmable device, it is also possible to change values for the replay timer.
You can create another type of transaction layer error by changing the length field within the TLP to be different from the actual length of the data in the payload. Most likely, this would be treated as a malformed packet. You can easily set up these errors using the exerciser, as shown in Figure A-15.
Figure A-15: Inserting Request and Completion Errors Using the Exerciser

Appendix A: Test, Debug and Verification

In addition to generating error conditions, you can use the exerciser as a tool to stress the link. This is done by programming the exerciser to do a block of requests, then repeat them continuously. It is possible to have multiple different requests repeated in continuous mode as shown in Figure A-16 and also, if required, add error-case scenarios.
Figure A-16: Using Continuous Mode on the Exerciser Running a Loop of Memory and I/O Transactions
You can program the configuration space of the exerciser to emulate different types of devices. PCIe devices may implement base address registers (BARS) and decoders as required, according to resources needed by that particular device. The BIOS on the system then assigns resources to these devices at system startup time. It is important for BIOS engineers to be able to verify that they can provide the correct resources for any combination of devices that may be plugged into the system. It is critical that address ranges do not overlap.
The exerciser has more than one way of carrying out this testing. Since the exerciser is a real PCIe device, you can manually configure and program the decoders prior to starting up the system. You can also map these decoders to different completion behaviors and priorities and map them to specific areas in the internal data memory of the exerciser. You have complete control of the location, size and type of each of these decoders, as shown in Figure A-17.

PCI Express System Architecture

Figure A-17: The Exerciser Has Fully Programmable Memory and I/O Decoders
Decoder
BAR 0
DecoderEnabled
LocationMemory (32 Bit)
PrefetchableNoResourceData Memory〈
Size2^7 (128 Byte )Data Memory Base Address000000Hex图
Base AddressFB000000Hex国Completion QueueQueue 0〈
BAR 1
DecoderEnabled
LocationMemory (32 Bit)
PrefetchableNoResourceData Memory〈
Size2^7 (128 Byte )〈Data Memory Base Address010000Hex國
Base AddressFF000000Hex图Completion QueueQueue 0〈
BAR 2
DecoderEnabled
Location1/0
PrefetchableNo-ResourceData Memory〈
Size2^2 (4 Byte)〈Data Memory Base Address020000Hex国
Base Address000000B0Hex图Completion QueueQueue 0〈
BAR 3
DecoderDisabled
Location1/0+
PrefetchableNo+RescurceData Memory、
Size2^2 │ 4.8 Jte7Data Memory Base Address000000Hex -國
Base Address00000000CompletionQueueQueue 07
BAR 4
DecoderDisabled
Location1/0-
PrefetchableNoResourceData Memory+
Size202 (4 Byte )甲Data Memory Base AddressboooooHex -图
Base Address00000000Hex函Completion QueueQueue 0-

Appendix A: Test, Debug and Verification

A second method is available for testing system BIOS. This method also provides a way to use the exerciser card as an interface to the PCIe port on the system, but it provides device emulation in software via the USB port. The advantage of this approach is that you can emulate and test many different topologies, including multiple levels of switches or bridges. It can also test the BIOS for different types of devices and resource requirements, such as a bridge device that requests memory resources. It is a valid configuration, but in the past, we have seen certain BIOS systems that will not support this device. It is important that the behavior be correct if an optional feature is not implemented. This method would also be helpful if a device requires more resources than are available on a system. The BIOS should handle this properly and disable the device.
Topology testing has been used since the days of conventional PCI and PCI-X. A PCI or PCI-X exerciser was used to emulate different devices and then check that addresses assigned were correct and not overlapping. The same principle is used for PCI Express testing from the outset, using the Agilent E2969A protocol test card for PCI Express. This method of testing is also ported to the Gen 2 PCI Express exerciser card. The same set of tests used in Gen 1 PCI Express is used. The main difference is that the speed is now 5GT/s rather than 2.5GT/s . Since the BIOS is at the application layer, above the transaction layer, the principles of testing it are independent of the physical link width and speed. The same test cases used in the original PCI and PCI-X testing are still quite valid.
Testing can be extended to cover many complex topologies. Figure A-18 is a screen through which you can set up topology tests.

PCI Express System Architecture

Figure A-18: Topology Tests Using the Agilent E2969A Protocol Test Card
Agilent E2969A Protocol Test Card for PCI Express - Compliance Test Suite口口区
Eile Yiew Iests Maintenance Help
VOCDICALLO
ExecuteStatusCategoryNameDescription
nn/aLegacyTestCasesTest Case 2.1No Request Case
nn/aLegacyTestCasesTest Case 2.1064 Bit Bar Case
nn/aLegacyTestCasesTest Case 2.1164 Bit Bar Prefetchable Case
nn/aLegacyTestCasesTest Case 2.124 1GB Mem32 Case
nn/aLegacyTestCasesTest Case 2.13Lots Large 10 Case
n/aLegacyTestCasesTest Case 2.14Lots Large Mem32 and No Requests Case
nn/aLegacyTestCasesTest Case 2.15Lots Large Mem32 and No Requests Case ets Case
nn/aLegacyTestCasesTest Case 2.3Simple Requests With Gaps Case
nn/aLegacyTestCasesTest Case 2.4Lots Of IO Requests Case
nn/aLegacyTestCasesTest Case 2.5Lots Of 256b IO Case
nn/aLegacyTestCasesTest Case 2.6Lots Of 256b Mem32 Case
nn/aLegacyTestCasesTest Case 2.7Various Mem32 Case
nn/aLegacyTestCasesTest Case 2.8Various Mem32 Prefetchable Case
n/aLegacyTestCasesTest Case 2.9Various IO Lower 16 Adress Case
nn/aFunctionTopologyCasesTest Case 3.19 Port Switch No Requests
nn/aFunctionTopologyCasesTest Case 3.10Eight Function Type 0 and Type 1 Various Requests
nn/aFunctionTopologyCasesTest Case 3.115 Levels of 4 Port Switches No Requests
nn/aFunctionTopologyCasesTest Case 3.29 Port Switch Various Requests
Stopped

Summary

When you are designing and validating PCI Express designs, it is important to cover all aspects of testing, as problems in the lower layers often result in problems at the upper layers, which ultimately lead to interoperability issues. By designing and testing to the PCIe specification, you can be assured that your devices will work properly and will not face real-world compatibility problems.

Appendix A: Test, Debug and Verification

Contact Agilent Technologies

For more information on the complete set of test tools from Agilent Technologies, please visit our Web site at: www.agilent.com
For PCI Express applications, please visit www.agilent.com/find/pciexpress
For oscilloscopes, please visit www.agilent.com/find/oscilloscopes
For protocol test tools, please visit www.agilent.com/find/e2960_series
  • PCI Express and PCI-X are registered trademarks of PCI-SIG.
  • PCIe is a trademark of PCI-SIG.

Appendix B Markets & Applications for the PCI Express TM~ Architecture

By Larry Chisvin, Akber Kazmi, and Danny Chi (PLX Technology, Inc.)

Introduction

Since its definition in the early 1990's, PCI has become one of the most successful interconnect technologies ever used in computers. Originally intended for personal computer systems, the PCI architecture has penetrated into virtually every computing platform category, including servers, storage, communications, and a wide range of embedded control applications. From its early incarnation as a 32-bit 33MHz interconnect,it has been expanded to offer higher speeds (currently in widespread use at 64-bit 133MHz ,with faster versions on the way). Most importantly, each advancement in PCI bus speed and width provided backward software compatibility, allowing designers to leverage the broad code base.
As successful as the PCI architecture has become, there is a limit to what can be accomplished with a multi-drop, parallel shared bus interconnect technology. Issues such as clock skew, high pin count, trace routing restrictions in printed circuit boards (PCB), bandwidth and latency requirements, physical scalability, and the need to support Quality of Service (QoS) within a system for a wide variety of applications lead to the definition of the PCI Express TM architecture.
PCI Express is the natural successor to PCI, and was developed to provide the advantages of a state-of-the-art, high-speed serial interconnect technology and packet based layered architecture, but maintain backward compatibility with the large PCI software infrastructure. The key goal was to provide an opti-

PCI Express System Architecture

mized and universal interconnect solution for a great variety of future platforms, including desktop, server, workstation, storage, communications and embedded systems.
Figure B-1: Migration from PCI to PCI Express
This chapter provides an overview of the markets and applications that PCI Express is expected to serve, with an explanation of how the technology will be integrated into each application, and some exploration of the advantages that PCI Express brings to each usage.
Let's review the key benefits of the PCI Express architecture before we discuss its application in different markets. Some of the key features of the architecture we reviewed in this book are:
  • Packet-based layered architecture
  • Serial interconnection at 2.5GHz ( 5GHz being considered)
  • Link-to-link and end-to-end error detection (CRC check)
  • Point-to-point data flow
  • Differential low voltage signals for noise immunity

Appendix B: Markets/Apps for PCI Express (by PLX)

  • Quality of Service (QoS)and Virtual Channels (VC)
  • Scalable from 1x to 32x lanes
  • Software (backward) compatibility with legacy PCI systems

Enterprise Computing Systems

PCI Express is expected to be deployed initially in desktop and server systems. These computers typically utilize a chipset solution that includes one or more microprocessors and two types of special interconnect devices, called north-bridges and southbridges. Northbridges connect the CPU with memory, graphics and I/O. Southbridges connect to standardized I/O devices such as hard disk drives, networking modules or devices, and often PCI expansion slots.

Desktop Systems

Typical use of PCI Express in a desktop application is shown in Figure B-2 on page 992. The PCI Express ports come directly out of the northbridge, and are bridged to PCI slots that are used for legacy plug-in cards. In some implementations the PCI Express interconnections will be completely hidden from the user behind PCI bridges, and in other implementations there will be PCI Express slots in a new PCI Express connector form factor.
The major benefit for using PCI Express in this application is the low pin count associated with serial interface technology, which will translate into lower cost. This low pin count provides the ability to create northbridges and I/O bridges with smaller footprints, and a significantly fewer number of board traces between the components. This provides a major reduction in the area and complexity of the signal/trace routing in PCBs.

Server Systems

Figure B-3 on page 993 shows PCI Express used in an enterprise server system. This system has similarities to the desktop system, since there is a north-bridge and southbridge providing functions that parallel their roles in the desktop system, and the form factor of the system is often similar. Servers, however, place a greater emphasis on performance than desktop systems do. PCI Express System Architecture
Figure B-2: PCI Express in a Desktop System
To achieve their performance and time to market objectives, server designers have adopted PCI-X. The primary attraction to PCI-X has been increased throughput, but with PCI code compatibility. PCI-X offers clear benefits compared to PCI, and will remain in server systems for a long while, but it suffers from the same shared bus limitations that have already been discussed. The high throughput of PCI Express serial interconnection provides a measurable benefit versus legacy interconnect technologies, especially as the speed of the I/O interconnect and the number of high speed I/O ports on each card increases.
Some systems will only provide PCI-X slots, but many newer systems will also offer several PCI Express slots. The number of PCI Express slots will grow over time compared to the PCI-X slots, and eventually will become dominant in the same way that PCI did with previous interconnect technologies. Since bandwidth is a primary motivator for a server, typical PCI Express slots will be either x4 or x8 lanes.
In most low to midrange server systems, the PCI-X bridging and PCI Express slots will be provided by using the ports right off of the northbridge. However, high-end systems will require more I/O slots of both kinds. Since PCI Express is a point-to-point technology, the only way to provide additional connection links is through a device called a fan out switch. Specifically, the purpose of a

Appendix B: Markets/Apps for PCI Express (by PLX)

fan out switch is to multiply the number of PCI Express lanes from an upstream host port to a higher number of downstream PCI Express devices. Figure 3 below, shows a PCI Express switch used in the system for this purpose.
Figure B-3: PCI Express in a Server System

Embedded Control

One of the many areas that PCI has penetrated is embedded-control systems. This describes a wide range of applications that measure, test, monitor, or display data, and includes applications such as industrial control, office automation, test equipment, and imaging.
In these applications, system designers typically utilize embedded processors. In many instances, leading-edge companies will differentiate their products by utilizing some custom logic in the form of an ASIC or FPGA. A bridge is often used to translate the simple custom interface and connect it to the bus.
It is expected that the embedded-control market will quickly migrate to PCI Express, with a typical example shown in Figure B-4 on page 994. Applications such as imaging and video streaming are always hungry for bandwidth, and the additional throughput of ×4 or ×8 PCI Express links will translate into

PCI Express System Architecture

higher video resolution, or the handling of more video streams by the system. Others will implement PCI Express because of the noise resistance its LVDS traces provide, or because of its efficient routing and its ability to hook together subsystems through a standard cable. Still others will choose PCI Express simply because of its ubiquity.
Figure B-4: PCI Express in Embedded-Control Applications

Storage Systems

PCI has become a common backplane technology for mainstream storage systems. Although it provides a good mix of features, low cost, and throughput, the "bus" has become a performance bottleneck. Figure B-5 on page 995 shows the use of PCI Express in a storage system. Systems similar to the one shown in Figure B-5 on page 995 can be built on a motherboard, or as part of a backplane. The discussion in this section applies to both form factors.
We have highlighted increased bandwidth as one of the advantages of moving to PCI Express, and nowhere is it more beneficial and obvious than in storage. The bandwidth demanded by I/O connections such as Ethernet, Fibre Channel, SCSI, and InfiniBand, is increasing rapidly. And the ability to move data between I/O modules and the host processor is critical to overall system performance.

Appendix B: Markets/Apps for PCI Express (by PLX)

In RAID based storage systems, for example, data to be archived is distributed across several disk drives to provide faster data retrieval and fault tolerance. As performance and complexity increase in these systems, the need for faster read and write operations from multiple I/O locations (disk drives) becomes extremely important. PCI Express, with its high performance, point-to-point architecture becomes very desirable for this application.
PCI Express provides a key reliability benefit in storage applications as well. The specification provides for two different types of error checking (CRC) schemes. There is CRC information between each link to ensure a reliable connection, and there is an optional end-to-end CRC that travels with the data from source to destination.
In High Availability (HA) applications, a separate host can reside in the system (as shown in Figure B-5 on page 995) for failover. If and when the primary host becomes unstable or non-operational, the secondary host will take over control of the system. This is an important feature for system level reliability when the designer is attempting to eliminate as many single points of failure as possible. This secondary host will be integrated into the system using non-transparent bridging (a detailed discussion of non-transparent bridging is provided in Appendix C).
Figure B-5: PCI Express in a Storage System

Communications Systems

The last application that we will explore is the use of PCI Express in communications systems. As with previous usage models, PCI technology has in the past made significant inroads into communication systems, but over time it has become less desirable due to the inherent limitations of a shared bus. In general, serial interconnects such as PCI Express have become attractive to backplane system designers by providing switch-based topologies that enable higher reliability, scalability, and robustness.
High end communications systems are based on one or more racks, with a mid-plane or backplane chassis used to interconnect each subsystem. Many systems use the CompactPCI architecture for their backplane implementations, and in some cases proprietary bus solutions are used to interconnect line cards, the switch fabric and the control modules. Some vendors are moving toward the AdvancedTCA TM (ATCA) architecture that supports a variety of different fabrics based upon a standard chassis for communication applications. ATCA has a range of benefits, but the PCI Express version of ATCA provides a smooth migration to higher speeds and a set of features that fit well with the communications paradigm.
One feature common to many communications systems is the ability to assign priorities to different data streams based on Quality of Service (QoS). PCI Express offers Traffic Classes (TC) that can be used to differentiate types of data. These TCs are then mapped onto Virtual Channels (VC) within the hardware. Each VC has its own set of queues in the subsystem, providing a separate path through the switch or bridge. This mechanism can be used to provide separate channels for different types of traffic (I/O, data, special messages).
Figure B-6 on page 997 shows a typical communication switch or router. Only 6 slots are shown for illustration purposes, but actual systems typically have 10+ slots. As shown in the figure, PCI Express technology can be used to support redundant switch fabrics and control modules, allowing communications equipment vendors to build high availability systems with a faster time to market.

Appendix B: Markets/Apps for PCI Express (by PLX)

Figure B-6: PCI Express in Communications Systems

Summary

PCI Express technology offers an improvement in performance and the promise of features beyond PCI, but does so in a way that preserves investment made in PCI software over the last ten years. The combination of increased bandwidth, reduced cost, and extended capabilities with an easy migration path is likely to make the PCI Express architecture the next ubiquitous interconnection technology for a wide variety of applications.

Appendix C

Implementing Intelligent Adapters and Multi-Host Systems With PCI Express TM Technology
By Jack Regula, Danny Chi and Tim Canepa (PLX Technology, Inc. )

Introduction

Intelligent adapters, host failover mechanisms and multiprocessor systems are three usage models that are common today, and expected to become more prevalent as market requirements for next generation systems. Despite the fact that each of these was developed in response to completely different market demands, all share the common requirement that systems that utilize them require multiple processors to co-exist within the system. This appendix outlines how PCI Express can address these needs through non-transparent bridging.
Because of the widespread popularity of systems using intelligent adapters, host failover and multihost technologies, PCI Express silicon vendors must provide a means to support them. This is actually a relatively low risk endeavor; given that PCI Express is software compatible with PCI, and PCI systems have long implemented distributed processing. The most obvious approach, and the one that PLX espouses, is to emulate the most popular implementation used in the PCI space for PCI Express. This strategy allows system designers to use not only a familiar implementation but one that is a proven methodology, and one that can provide significant software reuse as they migrate from PCI to PCI Express.
This paper outlines how multiprocessor PCI Express systems will be implemented using industry standard practices established in the PCI paradigm. We first, however, will define the different usage models, and review the successful efforts in the PCI community to develop mechanisms to accommodate these requirements. Finally, we will cover how PCI Express systems will utilize nontransparent bridging to provide the functionality needed for these types of systems.

Usage Models

Intelligent Adapters

Intelligent adapters are typically peripheral devices that use a local processor to offload tasks from the host. Examples of intelligent adapters include RAID controllers, modem cards, and content processing blades that perform tasks such as security and flow processing. Generally, these tasks are either computationally onerous or require significant I/O bandwidth if performed by the host. By adding a local processor to the endpoint, system designers can enjoy significant incremental performance. In the RAID market, a significant number of products utilize local intelligence for their I/O processing.
Another example of intelligent adapters is an ecommerce blade. Because general purpose host processors are not optimized for the exponential mathematics necessary for SSL, utilizing a host processor to perform an SSL handshake typically reduces system performance by over 90% . Furthermore,one of the requirements for the SSL handshake operation is a true random number generator. Many general purpose processors do not have this feature, so it is actually difficult to perform SSL handshakes without dedicated hardware. Similar examples abound throughout the intelligent adapter marketplace; in fact, this usage model is so prevalent that for many applications it has become the de facto standard implementation.

Host Failover

Host failover capabilities are designed into systems that require high availability. High availability has become an increasingly important requirement, especially in storage and communication platforms. The only practical way to ensure that the overall system remains operational is to provide redundancy for all components. Host failover systems typically include a host based system attached to several endpoints. In addition, a backup host is attached to the system and is configured to monitor the system status. When the primary host fails, the backup host processor must not only recognize the failure, but then take steps to assume primary control, remove the failed host to prevent additional disruptions, reconstitute the system state, and continue the operation of the system without losing any data.

Appendix D: Intelligent Adapters & Multi-Host Systems

Multiprocessor Systems

Multiprocessor systems provide greater processing bandwidth by allowing multiple computational engines to simultaneously work on sections of a complex problem. Unlike systems utilizing host failover, where the backup processor is essentially idle, multiprocessor systems utilize all the engines to boost computational throughput. This enables a system to reach performance levels not possible by using only a single host processor. Multiprocessor systems typically consist of two or more complete sub-systems that can pass data between themselves via a special interconnect. A good example of a multihost system is a blade server chassis. Each blade is a complete subsystem, often replete with its own CPU, Direct Attached Storage, and I/O.

The History Multi-Processor Implementations Using PCI

To better understand the implementation proposed for PCI Express, one needs to first understand the PCI implementation.
PCI was originally defined in 1992 for personal computers. Because of the nature of PCs at that time, the protocol architects did not anticipate the need for multiprocessors. Therefore, they designed the system assuming that the host processor would enumerate the entire memory space. Obviously, if another processor is added, the system operation would fail as both processors would attempt to service the system requests.
1Several methodologies were subsequently invented to accommodate the requirement for multiprocessor capabilities using PCI. The most popular implementation, and the one discussed in this paper for PCI Express, is the use of non-transparent bridging between the processing subsystems to isolate their memory spaces. 1
Because the host does not know the system topology when it is first powered up or reset, it must perform discovery to learn what devices are present and then map them into the memory space. To support standard discovery and configuration software, the PCI specification defines a standard format for Control and Status Registers (CSRs) of compliant devices. The standard PCI-to-PCI bridge CSR header, called a Type 1 header, includes primary, secondary and subordi-

  1. Unless explicitly noted, the architecture for multiprocessor systems using PCI and PCI Express are similar and may be used interchangeably.

PCI Express System Architecture

nate bus number registers that, when written by the host, define the CSR addresses of devices on the other side of the bridge. Bridges that employ a Type 1 CSR header are called transparent bridges.
A Type 0 header is used for endpoints. A Type 0 CSR header includes base address registers (BARs) used to request memory or I/O apertures from the host. Both Type 1 and Type 0 headers include a class code register that indicates what kind of bridge or endpoint is represented, with further information available in a subclass field and in device ID and vendor ID registers. The CSR header format and addressing rules allow the processor to search all the branches of a PCI hierarchy, from the host bridge down to each of its leaves, reading the class code registers of each device it finds as it proceeds, and assigning bus numbers as appropriate as it discovers PCI-to-PCI bridges along the way. At the completion of discovery, the host knows which devices are present and the memory and I/O space each device requires to function. These concepts are illustrated in Figure C - 1.
Figure C-1: Enumeration Using Transparent Bridges

Implementing Multi-host/Intelligent Adapters in PCI Express Base Systems

Up to this point, our discussions have been limited to one processor with one memory space. As technology progressed, system designers began developing end points with their own native processors built in. The problem that this caused was that both the host processor and the intelligent adapter would, upon power up or reset, attempt to enumerate the entire system, causing system conflict and ultimately a non-functional system. 2
To get around this, architects designed non-transparent bridges. A non-transparent PCI-to-PCI Bridge, or PCI Express-to-PCI Express Bridge, is a bridge that exposes a Type 0 CSR header on both sides and forwards transactions from one side to the other with address translation, through apertures created by the BARs of those CSR headers. Because it exposes a Type 0 CSR header, the bridge appears to be an endpoint to discovery and configuration software, eliminating potential discovery software conflicts. Each BAR on each side of the bridge creates a tunnel or window into the memory space on the other side of the bridge. To facilitate communication between the processing domains on each side, the non-transparent bridge also typically includes doorbell registers to send interrupts from each side of the bridge to the other, and scratchpad registers accessible from both sides.
A non-transparent bridge is functionally similar to a transparent bridge in that both provide a path between two independent PCI buses (or PCI Express links). The key difference is that when a non-transparent bridge is used, devices on the downstream side of the bridge (relative to the system host) are not visible from the upstream side. This allows an intelligent controller on the downstream side to manage the devices in its local domain, while at the same time making them appear as a single device to the upstream controller. The path between the two buses allows the devices on the downstream side to transfer data directly to the upstream side of the bus without directly involving the intelligent controller in the data movement. Thus transactions are forwarded across the bus unfettered just as in a PCI-to-PCI Bridge, but the resources responsible are hidden from the host, which sees a single device.

  1. While we are using an intelligent endpoint as the examples, we should note that a similar problem exists for multi-host systems.

PCI Express System Architecture

Because we now have two memory spaces, the PCI Express system needs to translate addresses of transactions that cross from one memory space to the other. This is accomplished via Translation and Limit Registers associated with the BAR. See "Address Translation" on page 1013 for a detailed description; Figure C-2 on page 1004 provides a conceptual rendering of Direct Address Translation. Address translation can be done by Direct Address Translation (essentially replacement of the data under a mask), table lookup, or by adding an offset to an address. Figure C-3 on page 1005 shows Table Lookup Translation used to create multiple windows spread across system memory space for packet originated in a local I/O processor's domain, as well as Direct Address Translation used to create a single window in the opposite direction.
Figure C-2: Direct Address Translation

Appendix D: Intelligent Adapters & Multi-Host Systems

Figure C-3: Look Up Table Translation Creates Multiple Windows

Example: Implementing Intelligent Adapters in a PCI Express Base System

Intelligent adapters will be pervasive in PCI Express systems, and will likely be the most widely used example of systems with "multiple processors".
Figure C-4 on page 1006 illustrates how PCI Express systems will implement intelligent adapters. The system diagram consists of a system host, a root complex (the PCI Express version of a Northbridge), a three port switch, an example endpoint, and an intelligent add-in card. Similar to the system architecture, the add-in card contains a local host, a root complex, a three port switch, and an example endpoint. However we should note two significant differences: the
intelligent add-in card contains an EEPROM, and one port of the switch contains a back to back non-transparent bridge.
Figure C-4: Intelligent Adapters in PCI and PCI Express Systems
Upon power up, the system host will begin enumerating to determine the topology. It will pass through the Root Complex and enter the first switch (Switch A). Upon entering the topmost port, it will see a transparent bridge, so it will know to continue to enumerate. The host will then poll the leftmost port and, upon finding a Type 0 CSR header, will consider it an endpoint and explore no deeper along that branch of the PCI hierarchy. The host will then use the information in the endpoint's CSR header to configure base and limit registers in bridges and BARs in endpoints to complete the memory map for this branch of the system.

Appendix D: Intelligent Adapters & Multi-Host Systems

The host will then explore the rightmost port of Switch A and read the CSR header registers associated with the top port of Switch B. Because this port is a non-transparent bridge, the host finds a Type 0 CSR header. The host processor therefore believes that this is an endpoint and explores no deeper along that branch of the PCI hierarchy. The host reads the BARs of the top port of Switch B to determine the memory requirements for windows into the memory space on the other side of the bridge. The memory space requirements can be preloaded from an EEPROM into the BAR Setup Registers of Switch B's non-transparent port or can be configured by the processor that is local to Switch B prior to allowing the system host to complete discovery.
Similar to the host processor power up sequence, the local host will also begin enumerating its own system. Like the system host processor, it will allocate memory for end points and continue to enumerate when it encounters a transparent bridge. When the host reaches the topmost port of Switch B, it sees a non-transparent bridge with a Type 0 CSR header. Accordingly, it reads the BARs of the CSR header to determine the memory aperture requirements, then terminates discovery along this branch of its PCI tree. Again, the memory aperture information can be supplied by an EEPROM, or by the system host.
Communication between the two processor domains is achieved via a mailbox system and doorbell interrupts. The doorbell facility allows each processor to send interrupts to the other. The mailbox facility is a set of dual ported registers that are both readable and writable by both processors. Shared memory mapped mechanisms via the BARs may also be used for inter-processor communication.

Example: Implementing Host Failover in a PCI Express System

Figure C-5 on page 1008 illustrates how most PCI Express systems will implement host failover. The primary host processor in this illustration is on the left side of the diagram, with the backup host on the right side of the diagram. Like most systems with which we are familiar, the host processor connects to a root complex. In turn, the root complex routes its traffic to the switch. In this example, the switch has two ports to end points in addition to the upstream port for the primary host we have just described. Furthermore, this system also has another processor, which is connected to the switch via another root complex. a transparent bridge or a non-transparent bridge. An EEPROM or strap pins on the switch can be used to initially bootstrap this configuration.
Figure C-5: Host Failover in PCI and PCI Express Systems
The switch ports to both processors need to be configurable to behave either as
Under normal operation, upon power up, the primary host begins to enumerate the system. In our example, as the primary host processor begins its discovery protocol through the fabric, it discovers the two end points, and their memory requirements, by sizing their BARs. When it gets to the upper right port, it finds a Type 0 CSR header. This signifies to the primary host processor that it should not attempt discovery on the far side of the associated switch port. As in the previous example, the BARs associated with the non-transparent switch port may have been configured by EEPROM load prior to discovery or might be configured by software running on the local processor.

Appendix D: Intelligent Adapters & Multi-Host Systems

Again, similar to the previous example, the backup processor powers up and begins to enumerate. In this example, the backup processor chipset consists of the root complex and the backup processor only. It discovers the non-transparent switch port and terminates its discovery there. It is keyed by EEPROM loaded Device ID and Vendor ID registers to load an appropriate driver.
During the course of normal operation, the host processor performs all of its normal duties as it actively manages the system. In addition, it will send messages to the backup processor called heartbeat messages. Heartbeat messages are indications of the continued good health of the originating processor. A heartbeat message might be as simple as a doorbell interrupt assertion, but typically would include some data to reduce the possibility of a false positive. Checkpoint and journal messages are alternative approaches to providing the backup processor with a starting point, should it need to take over. In the journal methodology, the backup is provided with a list or journal of completed transactions (in the application specific sense, not in the sense of bus transactions). In the checkpoint methodology, the backup is periodically provided with a complete system state from which it can restart if necessary. The heartbeat's job is to provide the means by which the backup processor verifies that the host processor is still operational. Typically this data provides the latest activities and the state of all the peripherals.
If the backup processor fails to receive timely heartbeat messages, it will begin assuming control. One of its first tasks is to demote the primary port to prevent the failed processor from interacting with the rest of the system. This is accomplished by reprogramming the CSRs of the switch using a memory mapped view of the switch's CSRs provided via a BAR in the non-transparent port. To take over, the backup processor reverses the transparent/non-transparent modes at both its port and the primary processor's port and takes down the link to the primary processor. After cleaning up any transactions left in the queues or left in an incomplete state as a result of the host failure, the backup processor reconfigures the system so that it can serve as the host. Finally, it uses the data in the checkpoint or journal messages to restart the system.

Example: Implementing Dual Host in a PCI Express Base System

Figure C-6 on page 1010 illustrates how PCI Express systems might implement a dual host system 3 . In this example,the leftmost blocks are a typically complete system, with the rightmost blocks being a separate subsystem. As previously discussed, connecting the leftmost and rightmost diagram is a set of nontransparent bridges.
Figure C-6: Dual Host in a PCI and PCI Express System
  1. Back to back non-transparent (NT) ports are unnecessary but occur as a result of the use of identical single board computers for both hosts. A transparent backplane fabric would typically be interposed between the two NT ports.

Appendix D: Intelligent Adapters & Multi-Host Systems

Upon power up, both processors will begin enumerating. As before, the hosts will search out the endpoints by reading the CSR and then allocate memory appropriately. When the hosts encounter the non-transparent bridge port in each of their private switches, they will assume it is an endpoint and, using the data in the EEPROM, allocate resources. Both systems will use the doorbell and mailbox registers described above to communicate with each other.
2 The dual-host system model may be extended to a fully redundant dual star system by using additional switches to dual-port the hosts and line cards into a redundant fabric as shown in Figure C-7 on page 1012. This is particularly attractive to vendors who employ chassis based systems for their flexibility, scalability and reliability.
Two host cards are shown. Host A is the primary host of Fabric A and the secondary host of Fabric B. Similarly, Host B is the primary host of Fabric B and the secondary host of Fabric A.
Each host is connected to the fabric it serves via a transparent bridge/switch port and to the fabric for which it provides only backup via a non-transparent bridge/switch port. These non-transparent ports are used for host-to-host communications and also support cross-domain peer-to-peer transfers where address maps do not allow a more direct connection.

PCI Express System Architecture

Figure C-7: Dual-Star Fabric

Summary

Through non-transparent bridging, PCI Express Base offers vendors the ability to integrate intelligent adapters and multi-host systems into their next generation designs. This appendix demonstrated how these features will be deployed using de-facto standard techniques adopted in the PCI environment and showed how they would be utilized for various applications. Because of this, we can expect this methodology to become the industry standard in the PCI Express paradigm.

Appendix D: Intelligent Adapters & Multi-Host Systems

Address Translation

This section provides an in-depth description of how systems that use nontransparent bridges communicate using address translation. We provide details about the mechanism by which systems determine not only the size of the memory allocated, but also about how memory pointers are employed. Implementations using both Direct Address Translation as well as Lookup Table Based Address Translation are discussed. By using the same standardized architectural implementation of non transparent bridging popularized in the PCI paradigm into the PCI Express environment, interconnect vendors can speed market adoption of PCI Express into markets requiring intelligent adapters, host failover and multihost capabilities.
The transparent bridge uses base and limit registers in I/O space, non-prefetch-able memory space, and prefetchable memory space to map transactions in the downstream direction across the bridge. All downstream devices are required to be mapped in contiguous address regions such that a single aperture in each space is sufficient. Upstream mapping is done via inverse decoding relative to the same registers. A transparent bridge does not translate the addresses of forwarded transactions/packets.
The non-transparent bridges use the standard set of BARs in their Type 0 CSR header to define apertures into the memory space on the other side of the bridge. There are two sets of BARs: one on the Primary side and one on the Secondary. BARs define resource apertures that allow the forwarding of transactions to the opposite (other side) interface.
For each BAR bridge there exists a set of associated control and setup registers usually writable from the other side of the bridge. Each BAR has a "setup" register, which defines the size and type of its aperture, and an address translation register. Some bars also have a limit register that can be used to restrict its aperture's size. These registers need to be programmed prior to allowing access from outside the local subsystem. This is typically done by software running on a local processor or by loading the registers from EEPROM.
In PCI Express, the Transaction ID fields of packets passing through these apertures are also translated to support Device ID routing. These Device IDs are used to route completions to non-posted requests and ID routed messages.
The transparent bridge forwards CSR transactions in the downstream direction according to the secondary and subordinate bus number registers, converting Type 1 CSRs to Type 0 CSRs as required. The non-transparent bridge accepts only those CSR transactions addressed to it and returns an unsupported request response to all others.

Direct Address Translation

The addresses of all upstream and downstream transactions are translated (except BARs accessing CSRs). With the exception of the cases in the following two sections, addresses that are forwarded from one interface to the other are translated by adding a Base Address to their offset within the BAR that they landed in as seen in Figure C-8 on page 1014. The BAR Base Translation Registers are used to set up these base translations for the individual BARs.
Figure C-8: Direct Address Translation

Lookup Table Based Address Translation

Following the de facto standard adopted by the PCI community, PCI Express should provide several BARs for the purposes of allocating resources. All BARs contain the memory allocation; however, in accordance with PCI industry conventions, BAR 0 contains the CSR information whereas BAR1 contains I/O information, BAR 2 and BAR 3 are utilized for Lookup Table Based Translation. BAR 4 and BAR 5 are utilized for Direct Address Translations.
On the secondary side, BAR3 uses a special lookup table based address translation for transactions that fall inside its window as seen in Figure C-9 on page 1015. The lookup table provides more flexibility in secondary bus local

Appendix D: Intelligent Adapters & Multi-Host Systems

addresses to primary bus addresses. The location of the index field with the address bus is programmable to adjust aperture size.
Figure C-9: Lookup Table Based Translation

Translated Base Lookup Table

Downstream BAR Limit Registers

The two downstream BARs on the primary side (BAR2/3 and BAR4/5) also have Limit registers, programmable from the local side, to further restrict the size of the window they expose, as seen in Figure C-10 on page 1016. BARs can only be assigned memory resources in "power of two" granularity. The limit registers provide a means to obtain better granularity by "capping" the size of the BAR within the "power of two" granularity. Only transactions below the Limit registers are forwarded to the secondary bus. Transactions above the limit are discarded or return 0xFFFFFFFF, or a master abort equivalent packet, on reads.

PCI Express System Architecture

Figure C-10: Use of Limit Register

Forwarding 64bit Address Memory Transactions

Certain BARs can be configured to work in pairs to provide the base address and translation for transactions containing 64-bit addresses. Transactions that hit within these 64-bit BARs are forwarded using Direct Address Translation. As in the case of 32 bit transactions, when a memory transaction is forwarded from the primary to the secondary bus, the primary address can be mapped to another address in the secondary bus domain. The mapping is performed by substituting a new base address for the base of the original address.

Appendix D: Intelligent Adapters & Multi-Host Systems

A 64-bit BAR pair on the system side of the bridge is used to translate a window of 64-bit addresses in packets originated on the system side of the bridge down below 232 in local space.

Appendix D Class Codes

This appendix lists the class codes, sub-class codes, and programming interface byte definitions currently provided in the 2.3PCI specification.
Figure D-1: Class Code Register
16 15 8 7
Table D-1: Defined Class Codes
ClassDescription
00hFunction built before class codes were defined (in other words: before rev 2.0 of the PCI spec).
01hMass storage controller.
02hNetwork controller.
03hDisplay controller.
04hMultimedia device.
05hMemory controller.
06hBridge device.
Table D-1: Defined Class Codes (Continued)
ClassDescription
07hSimple communications controllers.
08hBase system peripherals.
09hInput devices.
0AhDocking stations.
0BhProcessors.
0ChSerial bus controllers.
0DhWireless controllers.
0EhIntelligent IO controllers.
0FhSatellite communications controllers
10hEncryption/Decryption controllers.
11hData acquisition and signal processing controllers.
12h-FEhReserved.
FFhDevice does not fit any of the defined class codes.
Table D-2: Class Code 0 (PCI rev 1.0)
Sub-ClassProg. I/FDescription
00h00hAll devices other than VGA.
01h01hVGA-compatible device.
Table D-3: Class Code 1: Mass Storage Controllers
Sub-ClassProg. I/FDescription
00h00hSCSI controller.
01hxxhIDE controller. See Table D-20 on page 1031 for definition of Programming Interface byte.
Table D-3: Class Code 1: Mass Storage Controllers (Continued)
Sub-ClassProg. I/FDescription
02h00hFloppy disk controller.
03h00hIPI controller.
04h00hRAID controller.
05h20hATA controller with single DMA .
30hATA controller with chained DMA
80h00hOther mass storage controller.
Table D-4: Class Code 2: Network Controllers
Sub-ClassProg. I/FDescription
00h00hEthernet controller.
01h00hToken ring controller.
02h00hFDDI controller.
03h00hATM controller.
04h00hISDN Controller.
05h00hWorldFip controller.
06hPICMG 2.14 Multi Computing. For information on the use of the Programming Interface Byte, see the PICMG 2.14 Multi Computing Specification (http://www.picmg.com).
80h00hOther network controller.
Table D-5: Class Code 3: Display Controllers
Sub-ClassProg. I/FDescription
00h00hVGA-compatible controller, responding to memory addresses 000A0000h through 000BFFFFh (Video Frame Buffer), and IO addresses 03B0h through 3BBh, and 03C0h through-03DFh and all aliases of these addresses.
01h8514-compatible controller, responding to IO address 02E8h and its aliases, 02EAh and 02EFh.
01h00hXGA controller.
02h00h3D Controller.
80h00hOther display controller.
Table D-6: Class Code 4: Multimedia Devices
Sub-ClassProg. I/FDescription
00h00hVideo device.
01h00hAudio device.
02h00hComputer Telephony device.
80h00hOther multimedia device.
Table D-7: Class Code 5: Memory Controllers
Sub-ClassProg. I/FDescription
00h00hRAM memory controller.
01h00hFlash memory controller.
80h00hOther memory controller.
Table D-8: Class Code 6: Bridge Devices
1
Sub-ClassProg. I/FDescription
00h00hHost/PCI bridge.
01h00hPCI/ISA bridge.
02h00hPCI/EISA bridge.
03h00hPCI/Micro Channel bridge.
04h00hPCI/PCI bridge.
01hSubtractive decode PCI-to-PCI bridge. Sup- ports subtractive decode in addition to normal PCI-to-PCI functions. For a detailed discussion of this bridge type, refer to the MindShare PC1 System Architecture book, Fourth Edition (pub- lished by Addison-Wesley).
05h00hPCI/PCMCIA bridge
06h00hPCI/NuBus bridge.
07h00hPCI/CardBus bridge.
08hxxhRACEway bridge. RACEway is an ANSI stan- dard (ANSI/VITA 5-1994) switching fabric. Bits 7:1 of the Interface bits are reserved, read-only and return zeros. Bit 0 is read-only and, if 0, indicates that the bridge is in Transparent mode, while 1 indicates that it's in End-Point mode.
09h40hSemi-transparent PCI-to-PCI bridge with the primary PCI bus side facing the system host processor.
80hSemi-transparent PCI-to-PCI bridge with the secondary PCI bus side facing the system host processor
0Ah00hInfiniBand-to-PCI host bridge.
80h00hOther bridge type.
Table D-9: Class Code 7: Simple Communications Controllers
Sub-ClassProg. I/FDescription
00h00hGeneric XT-compatible serial controller.
01h16450-compatible serial controller.
02h16550-compatible serial controller.
03h16650-compatible serial controller.
04h16750-compatible serial controller.
05h16850-compatible serial controller.
06h16950-compatible serial controller.
01h00hParallel port.
01hBi-directional parallel port.
02hECP 1.X-compliant parallel port.
03hIEEE 1284 controller.
FEhIEEE 1284 target device (not a controller).
02h00hMultiport serial controller.
Table D-9: Class Code 7: Simple Communications Controllers (Continued)
Sub-ClassProg. I/FDescription
03h00hGeneric modem.
01hHayes-compatible modem, 16450-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
02hHayes-compatible modem, 16550-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
03hHayes-compatible modem, 16650-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
04hHayes-compatible modem, 16750-compatible interface. BAR 0 maps the modem’s register set. The register set can be either memory- or IO-mapped (as indicated by the type of BAR).
04h00hGPIB (IEEE 488.1/2) controller.
05h00hSmart Card.
80h00hOther communications device.
Table D-10: Class Code 8: Base System Peripherals
Sub-ClassProg. I/FDescription
00h00hGeneric 8259 programmable interrupt control- ler (PIC).
01hISA PIC.
02hEISA PIC.
10hIO APIC. Base Address Register 0 is used to request a minimum of 32 bytes of non-Prefetch- able memory. Two registers within that space are located at Base + 00h (IO Select Register) and Base + 10h (IO Window Register). For a full description of the use of these registers, refer to the data sheet for the Intel 8237EB in the 82420/ 82430 PCIset EISA Bridge Databook #290483- 003.
20hIO(x) APIC interrupt controller.
01h00hGeneric 8237 DMA controller.
01hISA DMA controller.
02hEISA DMA controller.
02h00hGeneric 8254 timer.
01hISA system timers.
02hEISA system timers.
03h00hGeneric RTC controller.
01hISA RTC controller.
04h00hGeneric PCI Hot-Plug controller.
80h00hOther system peripheral.
Table D-11: Class Code 9: Input Devices
Sub-ClassProg. I/FDescription
00h00hKeyboard controller.
01h00hDigitizer (pen).
02h00hMouse controller.
03h00hScanner controller.
04h00hGeneric gameport controller.
10hGameport controller. A gameport controller with a Programming Interface =10h indicates that any Base Address registers in this function that request/assign IO address space, the regis ters in that IO space conform to the standard “legacy” game ports. The byte at offset 00h in an IO region behaves as a legacy gameport interface where reads to the byte return joy- stick/gamepad information and writes to the byte start the RC timer. The byte at offset 01h is an alias of the byte at offset 00h. All other bytes in an IO region are unspecified and can be used in vendor unique ways.
80h00hOther input controller.
Table D-12: Class Code A: Docking Stations
Sub-ClassProg. I/FDescription
00h00hGeneric docking station.
80h00hOther type of docking station.
Table D-13: Class Code B: Processors
Sub-ClassProg. I/FDescription
00h00h386.
01h00h486.
02h00hPentium.
10h00hAlpha.
20h00hPowerPC.
30h00hMIPS
40h00hCo-processor.
Table D-14: Class Code C: Serial Bus Controllers
Sub-ClassProg. I/FDescription
00h00hFirewire (IEEE 1394).
10hIEEE 1394 using 1394 OpenHCI spec.
01h00hACCESS.bus.
02h00hSSA (Serial Storage Architecture).
03h00hUSB (Universal Serial Bus) controller using Universal Host Controller spec.
10hUSB (Universal Serial Bus) controller using Open Host Controller spec.
80hUSB (Universal Serial Bus) controller with no specific programming interface.
FEhUSB device (not Host Controller).
04h00hFibre Channel.
05h00hSMBus (System Management Bus).
06h00hInfiniBand.
Table D-14: Class Code C: Serial Bus Controllers (Continued)
Sub-ClassProg. I/FDescription
07h00hIPMI SMIC Interface. The register interface def- initions for the Intelligent Platform Manage- ment Interface Sub-Class 07h) are in the IPMI specification.
01hIPMI Kybd Controller Style Interface
02hIPMI Block Transfer Interface
08h00hSERCOS Interface Standard (IEC 61491). There is no register level definition for the SERCOS Interface standard. For more information see IEC 61491.
09h00hCANbus.
80h00hOther type of Serial Bus Controller.
Table D-15: Class Code D: Wireless Controllers
Sub-ClassInterfaceMeaning
0000hiRDA compatible controller
01h00hConsumer IR controller
10h00hRF controller
11h00hBluetooth.
12h00hBroadband.
80h00hOther type of wireless controller
Table D-16: Class Code E: Intelligent IO Controllers
Sub-ClassInterfaceMeaning
00hxxhIntelligent IO controller adhering to the I2O Architecture spec. The spec can be down- loaded from ftp.intel.com/pub/IAL/i2o/.
00hMessage FIFO at offset 40h.
80h00hOther type of Intelligent IO Controller.
Table D-17: Class Code F: Satellite Communications Controllers
Sub-ClassInterfaceMeaning
01h00hTV
02h00hAudio
03h00hVoice
04h00hData
80h00hOther type of Satellite Communications Controller.
Table D-18: Class Code 10h: Encryption/Decryption Controllers
Sub-ClassInterfaceMeaning
00h00hNetwork and computing Encrypt/Decrypt.
10h00hEntertainment Encrypt/Decrypt.
80h00hOther Encrypt/Decrypt.
Table D-19: Class Code 11h: Data Acquisition and Signal Processing Controllers
Sub-ClassInterfaceMeaning
00h00hDPIO modules.
01h00hPerformance counters.
10h00hCommunications synchronization plus time and frequency test/measurement.
20h00hManagement card.
80h00hOther Data Acquisition and Signal Process- ing Controllers.
Table D-20: Definition of IDE Programmer's Interface Byte Encoding
Bit(s)Description
0Operating mode (primary).
1Programmable indicator (primary).
2Operating mode (secondary).
3Programmable indicator (secondary).
6:4Reserved. Hardwired to zero.
7Master IDE device.
Note: The SIG document PCI IDE Controller Specification completely describes the layout and meaning of bits 0 through 3 in the Programming Interface byte. The document Bus Master Programming Interface for IDE ATA Controllers describes the meaning of bit 7 in the Programming Interface byte. While the PCI 2.1 spec stated that this document could be obtained via FAX by calling (408)741-1600 and requesting document 8038, that reference was removed from the 2.3 spec.

Appendix E

Locked Transactions Series

Introduction

Native PCI Express implementations do not support lock. Support for Locked transaction sequences exist solely for supporting legacy device software executing on the host processor that performs a locked RMW (read-modify-write) operation on a memory semaphore that may reside within the memory of a legacy PCI device. This chapter defines the protocol defined by PCI Express for supporting locked access sequences that target legacy devices. Failure to support lock may result in deadlocks.

Background

PCI Express continues the PCI 2.3 tradition of supporting locked transaction sequences (RMW-ready-modify-write) to support legacy device software. PCI Express devices and their software drivers are never allowed to use instructions that cause the CPU to generate locked operations that target memory that resides beneath the Roor Complex level.
Locked operations consist of the basic RMW sequence, that is:
  1. One or more memory reads from the target location to obtain the semaphore value.
  1. The modification of the data within a processor register.
  1. One or more writes to write the modified semaphore value back to the target memory location.

PCI Express System Architecture

This transaction sequence must be performed such that no other accesses are permitted to the target locations (or device) during the locked sequence. This requires blocking other transactions during the operation. The result potentially can result in deadlocks and poor performance.
The devices required to support locked sequences are:
  • The Root Complex.
  • Any Switches in the path leading to a legacy devices that may be the target of a locked transaction series.
  • A PCI Express - to - PCI Bridge.
  • A PCI Express-to-PCI-X Bridge.
  • Any legacy devices whose device drivers issue locked transactions to memory residing within the legacy device.
No other devices must support locked transactions and must ignore any locked transactions that they receive.
Lock in the PCI environment is achieved, in part, via the use of the PCI LOCK# signal. The equivalent functionality in PCI Express is accomplished via a transaction that emulates the LOCK signal functionality.

The PCI Express Lock Protocol

The only source of lock supported by PCI Express is the system processor, and, as a consequence, the source of all locked operations in PCI Express is the Root Complex (acting as the processor's surrogate). A locked operation is performed between a Root Complex downstream port and the PCI Express downstream port to which the targeted legacy device is attached. In most systems, the legacy device is typically a PCI Express-to-PCI or PCI Express-to-PCI-X bridge. Only one locked sequence at a time is supported for a given hierarchical path.
PCI Express limits locked transactions to Traffic Class 0 and Virtual Channel 0. All transactions with TC values other than zero that are mapped to a VC other than zero are permitted to traverse the fabric without regard to the locked operation. All transactions that are mapped to VC0 are subject to the lock rules described in this appendix. The discussion of the locked protocol in this appendix presumes that all transactions have been assigned to TC0 (unless otherwise indicated).

Lock Messages — The Virtual Lock Signal

PCI Express defines the following transactions that, together, act as a virtual wire that replaces the PCI LOCK# signal.
  • Memory Read Lock Request (MRdLk) - Originates a locked sequence. The first MRdLk transaction blocks other requests from reaching the target device. One or more of these locked read requests may be issued during the sequence.
  • Memory Read Lock Completion with Data (CplDLk) - Returns data and confirms that the path to the target is locked. A successful read Completion that returns data for the first Memory Read Lock request results in the path between the Root Complex and the target device being locked. That is, transactions traversing the same path from other ports are blocked from reaching either the root port or the target port. Transactions being routed in buffers for VC1-VC7 are unaffected by the lock.
  • Memory Read Lock Completion without Data (CplLK) - A Completion without a data payload indicates that the lock sequence cannot complete currently and the path remains unlocked.
  • Unlock Message - An unlock message is issued by the Root Complex from the locked root port. This message unlocks the path between the root port and the target port.

The Lock Protoco! Sequence an Example

This section explains the PCI Express lock protocol by example. The example includes the following devices:
  • The Root Complex that initiates the Locked transaction series on behalf of the host processor.
  • A Switch in the path between the root port and targeted legacy endpoint.
  • A PCI Express-to-PCI Bridge in the path to the target.
  • The target PCI device who's Device Driver initiated the locked RMW.
  • A PCI Express endpoint is included to describe Switch behavior during lock.
In this example, the locked operation completes normally. The steps that occur during the operation are described in the two sections that follow.

The Memory Read Lock Operation

Figure E-1 on page 1037 illustrates the first step in the Locked transaction series (i.e., the initial memory read to obtain the semaphore):
  1. The CPU initiates the locked sequence (a Locked Memory Read) as a result of a driver executing a locked RMW instruction that targets a PCI target.
  1. The Root Port issues a Memory Read Lock Request from port 2. The Root Complex is always the source of a locked sequence.
  1. The Switch receives the lock request on its upstream port and forwards the request to the target egress port (3). The switch, upon forwarding the request to the egress port, must block all requests from ports other than the ingress port (1) from being sent from the egress port.
  1. A subsequent peer-to-peer transfer from the illustrated PCI Express endpoint to the PCI bus (switch port 2 to switch port 3) would be blocked until the lock is cleared. Note that the lock is not yet established in the other direction. Transactions from the PCI Express endpoint could be sent to the Root Complex.
  1. The Memory Read Lock Request is sent from the Switch's egress port to the PCI Express-to-PCI Bridge. This bridge will implement PCI lock semantics (See the MindShare book entitled PCI System Architecture, Fourth Edition, for details regarding PCI lock).
  1. The bridge performs the Memory Read transaction on the PCI bus with the PCI LOCK# signal asserted. The target memory device returns the requested semaphore data to the bridge.
  1. Read data is returned to the Bridge and is delivered back to the Switch via a Memory Read Lock Completion with Data (CplDLk).
  1. The switch uses ID routing to return the packet upstream towards the host processor. When the CplDLk packet is forwarded to the upstream port of the Switch, it establishes a lock in the upstream direction to prevent traffic from other ports from being routed upstream. The PCI Express endpoint is completely blocked from sending any transaction to the Switch ports via the path of the locked operation. Note that transfers between Switch ports not involved in the locked operation would be permitted (not shown in this example).
  1. Upon detecting the CplDLk packet, the Root Complex knows that the lock has been established along the path between it and the target device, and the completion data is sent to the CPU.
Figure E-1: Lock Sequence Begins with Memory Read Lock Request

Read Data Modified and Written to Target and Lock Com- pletes

The device driver receives the semaphore value, alters it, and then initiates a memory write to update the semaphore within the memory of the legacy PCI device. Figure E-2 on page 1038 illustrates the write sequence followed by the Root Complex's transmission of the Unlock message that releases the lock:
  1. The Root Complex issues the Memory Write Request across the locked path to the target device.
  1. The Switch forwards the transaction to the target egress port (3). The memory address of the Memory Write must be the same as the initial Memory Read request.

PCI Express System Architecture

  1. The bridge forwards the transaction to the PCI bus.
  1. The target device receives the memory write data.
  1. Once the Memory Write transaction is sent from the Root Complex, it sends an Unlock message to instruct the Switches and any PCI/PCI-X bridges in the locked path to release the lock. Note that the Root Complex presumes the operation has completed normally (because memory writes are posted and no Completion is returned to verify success).
  1. The Switch receives the Unlock message, unlocks its ports and forwards the message to the egress port that was locked to notify any other Switches and/or bridges in the locked path that the lock must be cleared.
  1. Upon detecting the Unlock message, the bridge must also release the lock on the PCI bus.
Figure E-2: Lock Completes with Memory Write Followed by Unlock Message

Notification of an Unsuccessful Lock

A locked transaction series is aborted when the initial Memory Read Lock Request receives a Completion packet with no data (CplLk). This means that the locked sequence must terminate because no data was returned. This could result from an error associated with the memory read transaction, or perhaps the target device is busy and cannot respond at this time.

Summary of Locking Rules

Following is a list of ordering rules that apply to the Root Complex, Switches, and Bridges.

Rules Related To the Initiation and Propagation of Locked Transactions

  • Locked Requests which are completed with a status other than Successful Completion do not establish lock.
  • Regardless of the status of any of the Completions associated with a locked sequence, all locked sequences and attempted locked sequences must be terminated by the transmission of an Unlock Message.
  • MRdLk, CpIDLk and Unlock semantics are allowed only for the default Traffic Class (TC0).
  • Only one locked transaction sequence attempt may be in progress at a given time within a single hierarchy domain.
  • Any device which is not involved in the locked sequence must ignore the Unlock Message.
The initiation and propagation of a locked transaction sequence through the PCI Express fabric is performed as follows:
  • A locked transaction sequence is started with a MRdLk Request:
  • Any successive reads associated with the locked transaction sequence must also use MRdLk Requests.
  • The Completions for any successful MRdLk Request use the CplDLk Completion type, or the CPILk Completion type for unsuccessful Requests. PCI Express System Architecture
  • If any read associated with a locked sequence is completed unsuccessfully, the Requester must assume that the atomicity of the lock is no longer assured, and that the path between the Requester and Completer is no longer locked.
  • All writes associated with a locked sequence must use MWr Requests.
  • The Unlock Message is used to indicate the end of a locked sequence. A Switch propagates Unlock Messages through the locked Egress Port.
  • Upon receiving an Unlock Message, a legacy Endpoint or Bridge must unlock itself if it is in a locked state. If it is not locked, or if the Receiver is a PCI Express Endpoint or Bridge which does not support lock, the Unlock Message is ignored and discarded.

Rules Related to Switches

Switches must detect transactions associated with locked sequences from other transactions to prevent other transactions from interfering with the lock and potentially causing deadlock. The following rules cover how this is done. Note that locked accesses are limited to TC0, which is always mapped to VC0.
  • When a Switch propagates a MRdLk Request from an Ingress Port to the Egress Port, it must block all Requests which map to the default Virtual Channel (VC0) from being propagated to the Egress Port. If a subsequent MRdLk Request is received at this Ingress Port addressing a different Egress Port, the behavior of the Switch is undefined. Note that this sort of split-lock access is not supported by PCI Express and software must not cause such a locked access. System deadlock may result from such accesses.
  • When the CplDLk for the first MRdLk Request is returned, if the Completion indicates a Successful Completion status, the Switch must block all Requests from all other Ports from being propagated to either of the Ports involved in the locked access, except for Requests which map to channels other than VC0 on the Egress Port.
  • The two Ports involved in the locked sequence must remain blocked until the Switch receives the Unlock Message (at the Ingress Port which received the initial MRdLk Request)
  • The Unlock Message must be forwarded to the locked Egress Port.
  • The Unlock Message may be broadcast to all other Ports.
  • The Ingress Port is unblocked once the Unlock Message arrives, and the Egress Port(s) which were blocked are unblocked following the transmission of the Unlock Message out of the Egress Port(s). Ports that were not involved in the locked access are unaffected by the Unlock Message

Rules Related To PCI Express/PCI Bridges

The requirements for PCI Express/PCI Bridges are similar to those for Switches, except that, because these Bridges only use TC0 and VC0, all other traffic is blocked during the locked access. The requirements on the PCI bus side are described in the MindShare book entitled PCI System Architecture, Fourth Edition (published by Addison-Wesley).

Rules Related To the Root Complex

A Root Complex is permitted to support locked transactions as a Requester. If locked transactions are supported, a Root Complex must follow the rules already described to perform a locked access. The mechanism(s) used by the Root Complex to interface to the host processor's FSB (Front-Side Bus) are outside the scope of the spec.

Rules Related To Legacy Endpoints

Legacy Endpoints are permitted to support locked accesses, although their use is discouraged. If locked accesses are supported, legacy Endpoints must handle them as follows:
  • The legacy Endpoint becomes locked when it transmits the first Completion for the first read request of the locked transaction series access with a Successful Completion status:
  • If the completion status is not Successful Completion, the legacy Endpoint does not become locked.
  • Once locked, the legacy Endpoint must remain locked until it receives the Unlock Message.
  • While locked, a legacy Endpoint must not issue any Requests using Traffic Classes which map to the default Virtual Channel (VC0). Note that this requirement applies to all possible sources of Requests within the Endpoint, in the case where there is more than one possible source of Requests. Requests may be issued using Traffic Classes which map to VCs other than VC0.

Rules Related To PCI Express Endpoints

Native PCI Express Endpoints do not support lock. A PCI Express Endpoint must treat a MRdLk Request as an Unsupported Request.

Numerics

12x Packet Format 413
1x Packet Format 412
4x Packet Format 412
8b/10b Decoder 402
8b/10b Encoder 400, 424 A ACK 211
ACK DLLP 91, 92, 202, 219
ACK/NAK Latency 217, 237
ACK/NAK Protocol 90, 211, 212, 220
ACKD_SEQ Count 214
ACKNAK_Latency_Timer 217, 237
ACPI 577
ACPI Driver 570, 579
ACPI Machine Language 578, 580
ACPI Source Language 578, 580
ACPI spec 569
ACPI tables 577
Active State Power Management 46, 87, 403, 608 Advanced Configuration and Power Interface 569, 577
Advanced Correctable Error Reporting 385
Advanced Correctable Error Status 385
Advanced Correctable Errors 384
Advanced Error Capabilities and Control Register 935
Advanced Error Correctable Error Mask Register 935
Advanced Error Correctable Error Status Register 936
Advanced Error Reporting 382
Advanced Error Reporting Capability Register Set 931
Advanced Error Root Error Command Register 938
Advanced Error Root Error Status Register 938
Advanced Error Uncorrectable and Uncorrectable Error Source ID Register 938
Advanced Error Uncorrectable Error Mask Register 936
Advanced Error Uncorrectable Error Severity Register 937
Advanced Error Uncorrectable Error Status Register 937
Advanced Source ID Register 391
Advanced Uncorrectable Error Handling 386
Advanced Uncorrectable Error Status 387
AGP Capability 845
AGP Command Register 846
AGP Command register 846
AGP Status and AGP Command registers 845
AGP Status Register 845
AGP Status register 845
AML 578, 580
AML token interpreter 578
APIC 16, 25, 353
ASL 578, 580
ASPM 568
ASPM Exit Latency 628
Assert_INTx messages 348
Assigning VC Numbers 260
Async Notice of Slot Status Change 683
Attention Button Pressed Message 679
Attention Indicator 657, 664
Attention_Indicator_Blink Message 679
Attention_Indicator_Off Message 679
Attention_Indicator_On Message 679
Aux_Current field 598
B
BARs 793
Base Address Registers 792, 811
Beacon 469, 497, 642, 643
BER 455, 466
BIOS 577, 656, 886, 890
BIST 778
BIST register 778
Bit Error Rate 455
Bit Lock 94, 440, 441, 465
Bridge Control Register 835
Built-In Self-Test 778
Bus Enumerator 890
Bus Master 21, 833
Bus Number register 726, 805
Byte Count Modified 188
byte merging 801
Byte Striping 408
Byte Striping logic 400
C
Capabilities List bit 336, 585, 779, 837, 840
Capabilities Pointer register 585, 779, 780
Capability ID 332, 585, 780, 859
Card Connector Power Switching Logic 657
Card Information Structure 782
Card Insertion 658
Card Insertion Procedure 661
Card Present 657
Card Removal 658
Card Removal Procedure 659
Card Reset Logic 657
Cardbus 770, 777, 782
Character 72, 77, 400
Characters 405
Chassis and Slot Number Assignment 861
Chassis Number 860
Chassis, Expansion 862
Chassis, main 862
Chassis/Slot Numbering Registers 859, 863
CIS 782
Class Code 775, 875, 876, 882, 884, 1019
class code 0 1020
class code 1 1020
class code 10h 1030
class code 11h 1031
class code 2 1021
class code 3 1022
class code 4 1022
class code 5 1022
class code 6 1023
class code 7 1024
class code 8 1026
class code 9 1027
class code A 1027
class code B 1028
class code C1028
class code D 1029
class code E 1030
class code F 1030
Class Code register 774
Class driver 570, 774
code image 875, 878
Code Type 883, 885
Cold Reset 95, 488
Collapsing INTx Signals 349
Command register 832
company ID 953
Completer 37, 49, 50
Completer Abort 366
Completion 160
Completion Packet 184
Completion Status 187, 371
Completion Time-out 367
Completion W/Data 160
Completion-Locked 160
Completions 183
Config Type 0 Read Request 160
Config Type 0 Write Request 160
Config Type 1 Read Request 160
Config Type 1 Write Request 160
Configuration Address Port 724, 725, 726
Configuration Command Register 373
Configuration Data Port 724, 725
Configuration Request Packet 180
Configuration Requests 179
Configuration Space Layout 895
Configuration Status Register 374
Control Character Encoding 430
Control Method 578, 579
Correctable Errors 369
CRD 423
Credit Allocated Count 291
CREDIT_ALLOCATED 292
Credits Received Counter 291
CREDITS_CONSUMED 292
CREDITS_RECEIVED 291
Current Running Disparity 423
Cut-Through 102, 248
D
D characters 405
D0 573, 576, 586
D0 Active 587
D0 Uninitialized 586
D1 574, 576, 587
D1_Support bit 598
D2 574, 576, 589
D2_Support bit 597
D3 574, 576, 590
D3cold 592
D3hot 591
Data Link Layer Packet 71, 74
Data Poisoning 362
Data Register 603
Data_Scale field 601
Data_Select field 602
DDIM 887
Deassert_INTx messages 348
decoders 792
De-emphasis 455, 466
Default Device Class Power Management spec 576
Definition of On and Off 658
De-Scrambler 402
Device Capabilities Register 900
Device Class Power Management specs 576
Device Context 574
Device Control Register 905
Device Driver 656, 774, 791, 844, 872, 888,
891, 905
Device Driver Initialization Model 887
Device ID 773, 876, 882, 883
Device PM States 573, 586
Device ROM 783, 872
Device Serial Number Capability 952
Device Status Register 378, 909
Device-Specific Initialization (DSI) bit 599
Differential Receiver 439
Digest 166
Discard Timer SERR# Enable 837
Discard Timer Status 837
Discard unused prefetch data 801
Disparity 423
DLLP 71, 74, 75, 111, 154, 198, 201
Downstream 805
Downstream Port 50
Driver 681
DSI bit 599
Dual Simplex 41, 399
E
ECRC 166, 167
ECRC Generation and Checking 361, 383
EDB 412
Egress Port 44, 50
EISA 724
Elastic Buffer 402
Electrical Idle 41, 77, 108, 109, 432, 434, 454,
464
Enabling Error Reporting 377
END 412
End Tag descriptor 851
Endpoint 44, 48, 49, 51, 55
End-to-End CRC 166
Error Classifications 368
Error Handling 393
Error Handling Mechanisms 360
Error Logging 389
Error Messages 370
Error Reporting Mechanisms 359
Error Severity 388
EUI-64 953
Expansion ROM 872
Expansion ROM Base Address Register 783, 811,
872
Expansion ROM Enable bit 784
Expansion Slot 860
Extension ID 953
F
Fast Back-to-Back Enable 834, 836
FC Initialization Sequence 305
Fcode device driver 889
Fcode interpreter 889
First DW Byte Enables 164, 167
First-In-Chassis bit 864
Flag 317
Flow Control Buffer Size (max) 297
Flow Control Buffers 288
Flow Control Credits 286, 289
Flow Control Elements 290, 295
Flow Control Initialization 294, 304
Flow Control Packet Format 205
Flow Control Packets 293
Flow Control Update Frequency 310
Flow Control Updates 308
Forth 889
Framing Symbols 156, 400
FTS 109, 434
Function PM State Transitions 593
Function State Transition Delays 596
Fundamental Reset 95, 487, 488
G
General Purpose Event 579
GPE 579
GPE handler 579
H
Hardware Fixed VC Arbitration 269
Hardware-Fixed Port Arbitration 278
Header space 779
Header Type One 777
Header Type register 777
Header Type Two 777
Header Type Zero 777
Header Type/Format Field 165
Hierarchy 49
Hierarchy Domain 49
Host/PCI bridge 727
Hot Plug Elements 655
Hot Plug Messages 197
Hot Reset 95, 487, 491
Hot-Plug Controller 656
Hot-Plug primitives 682
Hot-Plug Service 655
Hot-Plug System Driver 655
Hub Link 32, 33, 35, 51
I
IDE 774, 872, 1031
Identifier String descriptor 851
IEEE 953
IEEE 1394 Bus Driver 577
IEEE standard 1275-1994 888
In-band Reset 491
Indicator Byte 883, 885
Infinite Flow Control Credits 301
Ingress Port 44, 50
InitFC1-P DLLP 201
Initial Program Load 872
Initialization code 885
Initialization code image 876
Initiator 118
input device 872
Interrupt Disable 346
Interrupt Latency 341
interrupt latency 341
Interrupt Line Register 345, 791
Interrupt Pin Register 343, 792
Interrupt Service Routine 886
Interrupt Status 346
Interrupt-Related Registers 844
Inter-symbol Interference 466, 467
INTx Message 193
INTx Message Format 351
INTx# Pins 342
INTx# Signaling 345
IO Base Address Register 797
IO Base and IO Limit registers 812
IO Decoder 797
IO decoder, Legacy 798
IO Extension registers 812
IO Read Request 160
IO Request Packet 172, 579
IO Requests 66, 171
IO Write Request 160
IPL 872
IRP 579
ISA Enable bit 836
ISA Plug-and-Play specification 887
Isochronous Transactions 252
K
K character 405
keywords 851, 853
L
L0 State 46, 403, 482
L0s State 611
L1 ASPM 606, 609, 614
L1 ASPM Negotiation 616
L1 State 629
L2 State 637
L2/L3 Ready state 633, 634
Lane 94, 95, 400, 408, 411, 415, 444
Lane Reversal 95
Last DW Byte Enables 165, 167
Latency Timer Registers 843
LCRC 72, 213, 216, 221
Legacy Endpoint 49, 330, 332, 335, 352
Link 13,14,41,94,101
Link Capabilities Register 609, 912
Link Control Register 915
Link Errors 379
Link Flow Control-Related Errors 363
Link Power Management 606
Link Status Register 918
Link Training and Initialization 94, 403, 496
Link Width 14,41,94,913
Low-priority VC Arbitration 267
LTSSM 213
M
Malformed TLP 364
Master Abort Mode 836
MCH 28, 33
Memory Base Address Register 794
Memory Base and Limit registers 823, 830
Memory Read Lock Request 160
Memory Read Request 160
Memory Request Packet 175
Memory Requests 64, 68, 174
Memory Space bit 784
Memory Write and Invalidate Enable 834
Memory Write Request 160
Memory-Mapped IO 793, 823, 830
Message Address Register 335, 336
Message Control Register 333, 336
Message Data register 335, 336
Message Request Packet 190
Message Request W/Data 160
Message Requests 63,160,190
Message Signaled Interrupts 331
Miniport Driver 570
MSI 331, 791
MSI Capability Register 332
MSI Configuration 336
Multiple Message Capable field 336
Multiple Messages 339
N
NAK DLLP 87, 90, 202, 219
NAK Scheduling 236
NAK_SCHEDULED Flag 217, 233
Namespace 577
New Capabilities list 837
Next Capability Pointer 859
NEXT_RCV_SEQ 203, 216, 219, 230
Non-Prefetchable Memory 796
North Bridge 16, 23, 29
Nullified Packet 384, 431
Number of Expansion Slots 864
O
OnNow Design Initiative 571
Open Firmware 888
OpenBoot 885, 888
Order Management 324
Ordered-Sets 405
Ordering Rules Summary 327
OS boot process 888
Output device 872
P
PA/RISC executable code 885
Parity Error Response 834, 836
Pause command 656, 681
Pausing a Driver 681
PCI Bus Driver 570, 571, 577
PCI Bus PM Interface Specification 569
PCI Data Structure 880
PCI Express Capability ID Register 898
PCI Express Capability Register Set 897
PCI Express Endpoint 49
PCI Interrupt Signaling 342
PCI PM 569
PCI power management 557, 567, 649
PCI-Compatible Error Reporting 372
PCI-to-PCI Address Decode-Related Registers 809
PCI-to-PCI bridge 727, 770, 777
PCI-to-PCI bridge terminology 805
Physical Slot ID 681
PM Capabilities (PMC) Register 597
PM Capability Registers 585
PM Control/Status (PMCSR) Register 599
PM Event (PME) Context 575
PM Registers 596
PM_Active_State_Request_L1 201
PM_Enter_L1 DLLP 201
PM_Enter_L23 201
PM_Request_Ack 201
PMC Register 597
PMCSR Register 599
PME Clock bit 599
PME Context 575
PME# 575
PME_En bit 602
PME_Status bit 601
PME_Support field 597
Polarity Inversion 94, 95
Port 42, 44, 50
Port Arbitration 45, 84, 85, 86, 263, 274, 277,
939
Port Arbitration Table 276, 280, 952
Port VC Capability Register 1 941
Port VC Control Register 944
Port VC Status Register 945
POST 881
Power Budget Register Set 955
Power Budgeting Capability Register 956
Power Budgeting Data Register 956
Power Budgeting Enhanced Capability Header 955
Power Indicator 665
Power IRP 579
power management 557,567,649
Power Management DLLP Packet 204
Power Management Messages 194
Power Management Policy Owner 577
Power Management Register Set 586, 596
Power_Indicator_Blink Message 679
Power_Indicator_Off Message 679
Power_Indicator_On Message 679
PowerPC 727
PowerState field 603
Prefetchable Attribute bit 795, 824
Prefetchable Memory 801, 829
Prefetchable Memory Base and Limit registers 823
Primary bus 805
Primary Bus Number register 806
Primary Discard Timeout 837
Primitives, hot-plug 655, 682
Producer/Consumer Model 317
Programming Interface byte 774
Q
QoS 11, 80, 82, 83, 264
Query Hot-Plug System Driver 682
Query Slot Status 683
Quiesce 681
Quiesce command 656
Quiescing Card and Driver 681
R
RCRB 275, 957, 958
Read/Write VPD Keywords 856
Relaxed Ordering 319
Replay 88
Replay Buffer 88
Replay Timer 384
Requester 37, 49, 50
Resume command 656
Retention Latch 666
Retention Latch Sensor 666
Revision ID 773
ROM Data Structure 878, 881
ROM Detection 872
ROM Header 878, 879
ROM shadowing 875
Root Complex 42, 48, 49, 107, 131, 330, 352,
370, 390, 714, 718, 722, 727, 742, 753,
757, 761, 765
Root Complex Error Status 390
Root Complex Register Block 957
Root Control Register 926
Root Error Command Register 392
Root Status Register 928
Round Robin VC Arbitration 270
RST# 657
Run time code image 876
Rx Clock Recovery 440
S
SCI 579
Scrambler 400, 416
SCSI 872
SDP 411
Secondary Bus 805
Secondary Bus Number register 806
Secondary Bus Reset bit 836
Secondary Discard Timeout 837
Secondary Latency Timer Register 843
Secondary Status Register 840
Sequence Number 213, 216, 234
SERR# Enable 834, 836
Set Slot Status 683
Severity of Error 388
shadow RAM 875
SKIP 431, 432, 434
Slot Capabilities Register 670, 920
Slot Control 672
Slot Control Register 923
Slot Number Assignment 861
Slot Numbering Identification 668
Slot Numbering Registers 859, 863
Slot Power Limit Control 672
Slot Power Limit Message 196
Slot Status Register 925
Soft Off 573
South Bridge 16, 32
Special Cycle 834
Split Completion, bridge claiming of 807
Start command 656
Status Register (Primary Bus) 837
Stepping Control bit 834
Sticky Bits 383
STP 411
Strict Priority VC Arbitration 265
String Identifier descriptor 855
Strong Ordering 321
Sub Class 774
Subordinate bus 805
Subordinate Bus Number register 726, 807
Subsystem ID 776
Subsystem Vendor ID 776
Surprise Removal Notification 652
Switch 11,42,48,50,86,282
Symbol 43, 93, 400, 421, 405
Symbol Lock 94, 405, 441
System Control Interrupt 579
System PM States 572
T
Target 20, 23, 24
TC 44
TC filtering 363
TC/VC Mapping 262
Time-Based, Weighted Round Robin Arbitration
279
TLP 55, 57, 71, 154, 156
Token 578, 580, 889
Traffic Class 44,81,87,161,164,252,256,262 ,
318, 321, 363
Training Sequence 1 405
Training Sequence 2 405
Transaction Descriptor 169
Transaction ID 169
Transaction Layer Packet 55
Transaction Types 113
Transactions 43
Translating Slot IDs 681
TS1 109, 405, 434
TS2 109, 405
TSIZ 727
Turning Slot On 659
Tx Buffer 404
Type 0 configuration transaction 727
Type 1 configuration transaction 727
U
UDF Supported bit 885
Uncorrectable Error Reporting 388
Uncorrectable Error Severity 388
Uncorrectable Fatal Errors 369
Uncorrectable Non-Fatal Errors 369
Unexpected Completion 367
Universal Device Driver 889
Universal/Local Bit 953
Unlock Message 196
Unsupported Request 365
Upstream 805
Upstream Port 50
USB Bus Driver 577
V
VC 44
VC Arbitration 44,50,84,85,86,264,267,270 ,
765, 939, 944
VC Arbitration Table 951
VC Resource Capability Register 946
VC Resource Control Register 948
VC Resource Status Register 950
Vendor ID 773, 876, 882, 883
VGA 774
VGA device ROM 875
VGA Enable bit 836
VGA Palette Snoop bit 834
Virtual Channel 4, 83, 256, 260, 263, 270, 286,
288, 323, 324
Vital Product Data 848, 882, 884
VPD 848, 882, 884
VPD Checksum 855
VPD data structure 857
VPD-R descriptor 851, 853
VPD-W descriptor 852, 856
W
WAKE# Signal 642, 696
Warm Reset 95, 488
WDM Device Driver 570, 577, 579
Weak Ordering 324
Weighted Round Robin Port Arbitration 279
Weighted Round Robin VC Arbitration 269
Windows 95/98/NT/2000 569
Windows Driver Model 579
Working state 572 PC Programming/Hardware “ P C1 Express System Architecture is a high quality and comprehensive must-have reference for any engineer working with PCI Express. Highly recommended."
-David Churchill | Agrilent Technologies
PCI Express is the third-generation Peripheral Component Interconnect technology for a wide range of systems and peripheral devices. Incorporating recent advances in high-speed, point-to-point interconnects, PCI Express provides significantly higher performance, reliability, and enhanced capabilities - at a lower cost-than the previous PCI and PCI-X standards. Therefore, anyone working on next-generation PC systems, BIOS and device driver development, and peripheral device design will need to have a thorough understanding of PCI Express.
PCI Express System Architecture provides an in-depth description and comprehensive reference to the PCI Express standard. The book contains information needed for design, verification, and test, as well as background information essential for writing low-level BIOS and device drivers. In addition, it offers valuable insight into the technology's evolution and cutting-edge features.
Following an overview of the PCI Express architecture, the book moves on to cover transaction protocols, the physical/electrical layer, power management, configuration, and more. Specific topics covered include:
  • Split transaction protocol
  • Packet format and definition, including use of each field
  • ACK/NAK protocol
  • Traffic Class and Virtual Channel applications and use
  • Flow control initialization and operation
  • Error checking mechanisms and reporting options
  • Switch design issues
  • Advanced Power Management mechanisms and use
  • Active State Link power management
  • Hot Plug design and operation
  • Message transactions
  • Physical layer functions
  • Electrical signaling characteristics and issues
  • PCI Express enumeration procedures
  • Configuration register definitions
Thoughtfully organized, featuring a plethora of illustrations, and comprehensive in scope, PCI Express System Architecture is an essential resource for anyone working with this important technology.
MindShare's PC System Architecture Series is a crisply written and comprehensive set of guides to the most important PC hardware standards. Books in the series are intended for use by hardware and software designers, programmers, and support personnel.
MindShare, Inc., is one of the leading technical training companies in the hardware industry, providing innovative courses for dozens of companies, including IBM, HP, PLX, Sun, and Texas Instruments.
Ravi Budruk is a senior staff engineer and instructor with MindShare, Inc., where he has trained hundreds of engineers. He is an industry expert on such topics as Intel Processor and PC architecture, as well as such bus architectures as PCI Express, PCI, PCI-X, HyperTransport, IEEE 1394, and ISA. Before working at MindShare, Mr. Budruk was a PC chipset architect and designer at VLSI Technology, Inc.
Don Anderson is an expert on digital electronics and system design. He passes on his wealth of experience by training engineers, programmers, and technicians at MindShare, Inc., and is the author of numerous MindShare books.
Tom Shanley is President of MindShare, Inc., and one of the world's foremost authorities on computer system architecture.
www.informit.com/aw
www.mindshare.com
Cover design by Barhara T. Advisson
Coner photograph by Tassahiko Shimada/Photonics
Contact www.mindshare.com for Training on This Subject