6

Recently, a graphics card with an NVMe-Slot has been released. In a discussion about this device and how the SSD could be connected to the system, the following question has emerged:

Given a GPU with 16 PCIe ports and an SSD with 4 PCIe ports, which are together in some way connected to the system via a PCIe slot with 16 PCIe lanes, would it be possible to connect 12 ports from the GPU directly to the slot and the remaing 4 ports of the GPU and the 4 ports of the SSD to a PCIe switch, which in turn is connected to the remaining 4 PCIe lanes of the PCIe slot?

Or more generally: Can only part of the lanes of a PCIe link be switched?

If so, if the SSD would produce a lot of traffic, would that slow down only the 4 lanes of the GPU that are shared with the SSD or all 16 lanes?

Or more generally: Are PCIe packets in a multi-lane link distributed to the lanes in a simple round-robin-like manner or are they balanced based on the readiness of each lane?

Empy
  • 63

3 Answers3

6
  1. For this part of the question:

    Are PCIe packets in a multi-lane link distributed to the lanes in a simple round-robin-like manner or are they balanced based on the readiness of each lane?

    No, in a multi-lane configuration PCIe packets are striped across multiple lanes, rather each packet being sent on a single lane. From the Wikipedia PCI Express page:

    The PCI Express link between two devices can vary in size from one to 16 lanes. In a multi-lane link, the packet data is striped across lanes, and peak data throughput scales with the overall link width. The lane count is automatically negotiated during device initialization and can be restricted by either endpoint.

  2. For this part of the question:

    Would it be possible to connect 12 ports from the GPU directly to the slot and the remaining 4 ports of the GPU and the 4 ports of the SSD to a PCIe switch, which in turn is connected to the remaining 4 PCIe lanes of the PCIe slot?

    As above since on one one PCIe endpoint (GPU) the packets are striped across multiple lanes, it isn't possible to split the 16 lanes into a group of 12 and 4 lanes, connected to different PCIe root ports.

    While there is PCIe Bifurcation which allows allows division of lanes on a PCIe slot:

    • As far as I am aware PCIe Bifurcation isn't available on a GPU endpoint.
    • While the PCIe specification describes operation at lane widths of x1, x2, x4, x8, x12, x16 and x32 (e.g. see section 1.2. PCI Express Link in PCI Express® Base Specification Revision 4.0 Version 0.3), the above Wikipedia page says the ×12 and x32 links were defined but virtually never used. I.e. not sure how many root ports support a x12 lane width.
    • Looking at the BIOS for a HP Z4 the Bifurcation options for a x16 slot on the Sky Lake-E PCI Express Root are only:
      • Auto
      • x8x8
      • x4x4x4x4
    • Looking a the UltraScale+ Devices Integrated Block for PCI Express which is used in a FPGA and can be configured as either an endpoint or root port that only supports 1-lane, 2-lane, 4-lane, 8-lane, and 16-lane configurations. I.e. no x12 support even in a configurable PCIe block.

On the subject of PCIe Bifurcation on a PCIe card, I remember one unusual case of an Intel® Ethernet Network Adapter E810-CQDA2T, which has dual 100/50/25/10GbE ports.

The specifications contain:

Speed & Slot Width : 16 GT/s x16 lanes

Yet, when it was fitted to a PCIe 4.0 x16 slot:

  1. Initially with PCIe Bifurcation disabled for the x16 slot, which was the default in the BIOS, only a single port of the E810 was found.
  2. After changing the BIOS settings to enable PCIe Bifurcation as x8x8 for the slot both ports of the the E810 was found.

The Intel site has Only One Port of the Intel® Ethernet Network Adapter E810-2CQDA2 Is Working on a Dell PowerEdge R740 which is a bit vague about why PCIe Bifurcation was required.

Think that particular E810-2CQDA2 may have two Intel® Ethernet Controller E810-CAM1 ICs, each connected to 8 lanes of the x16 width card (looking at a physical card there seemed to be two ICs under the heatsink).

6

GPUs often have a PCIe switch at the top (e.g. Intel B580, ATI RX9070XT).

This allows a number of things:

  • reusing an existing sound card IC for the audio channels
  • almost guaranteeing compatibility without extensive verification -- the switch is responsible for all the link negotiation, and provides a predictable interface to the GPU

I think it also helps with generating hotplug events for Virtual Functions if the GPU supports SR-IOV. The only drawback is that the host needs to reconfigure a lot more devices for the resizable BAR, but with a modern BIOS, that is usually already handled during initial operation, and it's a software issue.

If you have a switch anyway that has an x16 downlink to the GPU and an x1 downlink to the audio interface, chances are good that there is another x4 link available that can be connected to an SSD.

The big advantage of connecting the SSD to this switch is that (unlike a lot of root switches), this switch will support device-to-device transfers between downstream ports, so you can copy data directly, for example for uncompressed video capture if you want to run a high quality encode.

4

First some terminology correction:

  • A 'Lane' is a single full-duplex path consisting of a single Tx and single Rx pair.
  • A 'Link' is a single point-to-point connection between exactly two devices, no more, no less. A link may consist of one or more lanes.

A link is point to point, and therefore it is not possible to split it up and recombine it.

It is possible under certain situations for a device to support "bifurcation" whereby it takes its available lanes and splits them up into multiple links, however this is not a requirement in the specification, nor is it always available. It is a hardware specific feature which involves a device being able to reconfigure itself to be a virtual switch.

It is not possible to take the lanes from multiple links and combine them into a single link (the reverse of bifurcation). This would be called aggregation and there is no defined support for it.


From my understanding of your question, what you are asking is if you can do this:

Arrangement of PCIe switch

If that is the case, the simple answer is no you can not. This will not work how you expect if it even works at all.

Wiring things up like this will produce one of the following results depending on which lanes the "4" you split out are:

  1. If the four lanes split off are 0-3 on both the RC and GPU, then either:

    • The GPU and SSD will link up as either x1 or x4 devices behind the PCIe switch, and the RC will see the switch on a x1 or x4 link.
    • Nothing will be detected because the 12 remaining lanes being connected confuse the upstream port and prevent a link being established.
  2. If the four lanes are any others (e.g. upper four, or random selection), then either:

    • The GPU will be detected and link up as a x1, x4, x8, (or highly unlikely x12) device depening on what lane widths both it and the RC support. The SSD and PCIe switch will not be detected.
    • Nothing will be detected because the 4 remaining lanes confuse the upstream port because they are connected to some other random device.
  3. If you were able to configure bifurcation on the RC (for example turning the port into a x8,x4,x4 mode), then assuming the PCIe switch is connected to the lanes of one of the x4 links:

    • the GPU links up at x1, x4, or x8 directly from the RC, and the SSD appears as a x1 or x4 device behind the PCIe switch.

In any of these cases the four lanes from the PCIe switch to the GPU do nothing at all other than potentially prevent a link forming.


The correct way to wire this up would be either using a PCIe switch or bifurcation alone.

  1. Using a switch with at least 36 lanes, whereby the upstream port connects to the RC with a x16 link, and the switch is configured to have two downstream ports, one for each of the SSD (x4) and GPU (x16).

    Switch between the two devices

  2. Using bifurcation alone if supported by your RC. Here you could have a x4 link for the SSD, and a x8 link for the GPU:

    Bifurcated connection for the two devices

Notice how in both cases each link connects to exactly two devices. These are therefore valid links.

Depending on your GPU, you may not be able to get a x8 link, in which case in the second bifurcated case, it will form a x1 link instead. This is because PCIe devices are only required to support a x1 link (needed for establishing contact), and their native link width (e.g. x16). Any partial widths in between such as x4 or x8 on a x16 device are not necessarily supported by the device. For many consumer desktop GPUs, both x8 and x16 are generally supported.