| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368 |
- .. SPDX-License-Identifier: GPL-2.0
- =================================
- NVMe PCI Endpoint Function Target
- =================================
- :Author: Damien Le Moal <dlemoal@kernel.org>
- The NVMe PCI endpoint function target driver implements an NVMe PCIe controller
- using an NVMe fabrics target controller configured with the PCI transport type.
- Overview
- ========
- The NVMe PCI endpoint function target driver allows exposing an NVMe target
- controller over a PCIe link, thus implementing an NVMe PCIe device similar to a
- regular M.2 SSD. The target controller is created in the same manner as when
- using NVMe over fabrics: the controller represents the interface to an NVMe
- subsystem using a port. The port transfer type must be configured to be
- "pci". The subsystem can be configured to have namespaces backed by regular
- files or block devices, or can use NVMe passthrough to expose to the PCI host an
- existing physical NVMe device or an NVMe fabrics host controller (e.g. a NVMe
- TCP host controller).
- The NVMe PCI endpoint function target driver relies as much as possible on the
- NVMe target core code to parse and execute NVMe commands submitted by the PCIe
- host. However, using the PCI endpoint framework API and DMA API, the driver is
- also responsible for managing all data transfers over the PCIe link. This
- implies that the NVMe PCI endpoint function target driver implements several
- NVMe data structure management and some NVMe command parsing.
- 1) The driver manages retrieval of NVMe commands in submission queues using DMA
- if supported, or MMIO otherwise. Each command retrieved is then executed
- using a work item to maximize performance with the parallel execution of
- multiple commands on different CPUs. The driver uses a work item to
- constantly poll the doorbell of all submission queues to detect command
- submissions from the PCIe host.
- 2) The driver transfers completion queues entries of completed commands to the
- PCIe host using MMIO copy of the entries in the host completion queue.
- After posting completion entries in a completion queue, the driver uses the
- PCI endpoint framework API to raise an interrupt to the host to signal the
- commands completion.
- 3) For any command that has a data buffer, the NVMe PCI endpoint target driver
- parses the command PRPs or SGLs lists to create a list of PCI address
- segments representing the mapping of the command data buffer on the host.
- The command data buffer is transferred over the PCIe link using this list of
- PCI address segments using DMA, if supported. If DMA is not supported, MMIO
- is used, which results in poor performance. For write commands, the command
- data buffer is transferred from the host into a local memory buffer before
- executing the command using the target core code. For read commands, a local
- memory buffer is allocated to execute the command and the content of that
- buffer is transferred to the host once the command completes.
- Controller Capabilities
- -----------------------
- The NVMe capabilities exposed to the PCIe host through the BAR 0 registers
- are almost identical to the capabilities of the NVMe target controller
- implemented by the target core code. There are some exceptions.
- 1) The NVMe PCI endpoint target driver always sets the controller capability
- CQR bit to request "Contiguous Queues Required". This is to facilitate the
- mapping of a queue PCI address range to the local CPU address space.
- 2) The doorbell stride (DSTRB) is always set to be 4B
- 3) Since the PCI endpoint framework does not provide a way to handle PCI level
- resets, the controller capability NSSR bit (NVM Subsystem Reset Supported)
- is always cleared.
- 4) The boot partition support (BPS), Persistent Memory Region Supported (PMRS)
- and Controller Memory Buffer Supported (CMBS) capabilities are never
- reported.
- Supported Features
- ------------------
- The NVMe PCI endpoint target driver implements support for both PRPs and SGLs.
- The driver also implements IRQ vector coalescing and submission queue
- arbitration burst.
- The maximum number of queues and the maximum data transfer size (MDTS) are
- configurable through configfs before starting the controller. To avoid issues
- with excessive local memory usage for executing commands, MDTS defaults to 512
- KB and is limited to a maximum of 2 MB (arbitrary limit).
- Minimum number of PCI Address Mapping Windows Required
- ------------------------------------------------------
- Most PCI endpoint controllers provide a limited number of mapping windows for
- mapping a PCI address range to local CPU memory addresses. The NVMe PCI
- endpoint target controllers uses mapping windows for the following.
- 1) One memory window for raising MSI or MSI-X interrupts
- 2) One memory window for MMIO transfers
- 3) One memory window for each completion queue
- Given the highly asynchronous nature of the NVMe PCI endpoint target driver
- operation, the memory windows as described above will generally not be used
- simultaneously, but that may happen. So a safe maximum number of completion
- queues that can be supported is equal to the total number of memory mapping
- windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI
- controller with 32 outbound memory windows available, up to 30 completion
- queues can be safely operated without any risk of getting PCI address mapping
- errors due to the lack of memory windows.
- Maximum Number of Queue Pairs
- -----------------------------
- Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint
- controller, BAR 0 is allocated with enough space to accommodate the admin queue
- and multiple I/O queues. The maximum of number of I/O queues pairs that can be
- supported is limited by several factors.
- 1) The NVMe target core code limits the maximum number of I/O queues to the
- number of online CPUs.
- 2) The total number of queue pairs, including the admin queue, cannot exceed
- the number of MSI-X or MSI vectors available.
- 3) The total number of completion queues must not exceed the total number of
- PCI mapping windows minus 2 (see above).
- The NVMe endpoint function driver allows configuring the maximum number of
- queue pairs through configfs.
- Limitations and NVMe Specification Non-Compliance
- -------------------------------------------------
- Similar to the NVMe target core code, the NVMe PCI endpoint target driver does
- not support multiple submission queues using the same completion queue. All
- submission queues must specify a unique completion queue.
- User Guide
- ==========
- This section describes the hardware requirements and how to setup an NVMe PCI
- endpoint target device.
- Kernel Requirements
- -------------------
- The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT,
- CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled.
- CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled
- (obviously).
- In addition to this, at least one PCI endpoint controller driver should be
- available for the endpoint hardware used.
- To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK)
- is also recommended. With this, a simple setup using a null_blk block device
- as a subsystem namespace can be used.
- Hardware Requirements
- ---------------------
- To use the NVMe PCI endpoint target driver, at least one endpoint controller
- device is required.
- To find the list of endpoint controller devices in the system::
- # ls /sys/class/pci_epc/
- a40000000.pcie-ep
- If PCI_ENDPOINT_CONFIGFS is enabled::
- # ls /sys/kernel/config/pci_ep/controllers
- a40000000.pcie-ep
- The endpoint board must of course also be connected to a host with a PCI cable
- with RX-TX signal swapped. If the host PCI slot used does not have
- plug-and-play capabilities, the host should be powered off when the NVMe PCI
- endpoint device is configured.
- NVMe Endpoint Device
- --------------------
- Creating an NVMe endpoint device is a two step process. First, an NVMe target
- subsystem and port must be defined. Second, the NVMe PCI endpoint device must
- be setup and bound to the subsystem and port created.
- Creating an NVMe Subsystem and Port
- -----------------------------------
- Details about how to configure an NVMe target subsystem and port are outside the
- scope of this document. The following only provides a simple example of a port
- and subsystem with a single namespace backed by a null_blk device.
- First, make sure that configfs is enabled::
- # mount -t configfs none /sys/kernel/config
- Next, create a null_blk device (default settings give a 250 GB device without
- memory backing). The block device created will be /dev/nullb0 by default::
- # modprobe null_blk
- # ls /dev/nullb0
- /dev/nullb0
- The NVMe PCI endpoint function target driver must be loaded::
- # modprobe nvmet_pci_epf
- # lsmod | grep nvmet
- nvmet_pci_epf 32768 0
- nvmet 118784 1 nvmet_pci_epf
- nvme_core 131072 2 nvmet_pci_epf,nvmet
- Now, create a subsystem and a port that we will use to create a PCI target
- controller when setting up the NVMe PCI endpoint target device. In this
- example, the port is created with a maximum of 4 I/O queue pairs::
- # cd /sys/kernel/config/nvmet/subsystems
- # mkdir nvmepf.0.nqn
- # echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model
- # echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id
- # echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id
- # echo 1 > nvmepf.0.nqn/attr_allow_any_host
- # echo 4 > nvmepf.0.nqn/attr_qid_max
- Next, create and enable the subsystem namespace using the null_blk block
- device::
- # mkdir nvmepf.0.nqn/namespaces/1
- # echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path
- # echo 1 > "nvmepf.0.nqn/namespaces/1/enable"
- Finally, create the target port and link it to the subsystem::
- # cd /sys/kernel/config/nvmet/ports
- # mkdir 1
- # echo -n "pci" > 1/addr_trtype
- # ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \
- /sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn
- Creating an NVMe PCI Endpoint Device
- ------------------------------------
- With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint
- device can now be created and enabled. The NVMe PCI endpoint target driver
- should already be loaded (that is done automatically when the port is created)::
- # ls /sys/kernel/config/pci_ep/functions
- nvmet_pci_epf
- Next, create function 0::
- # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
- # mkdir nvmepf.0
- # ls nvmepf.0/
- baseclass_code msix_interrupts secondary
- cache_line_size nvme subclass_code
- deviceid primary subsys_id
- interrupt_pin progif_code subsys_vendor_id
- msi_interrupts revid vendorid
- Configure the function using any device ID (the vendor ID for the device will
- be automatically set to the same value as the NVMe target subsystem vendor
- ID)::
- # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
- # echo 0xBEEF > nvmepf.0/deviceid
- # echo 32 > nvmepf.0/msix_interrupts
- If the PCI endpoint controller used does not support MSI-X, MSI can be
- configured instead::
- # echo 32 > nvmepf.0/msi_interrupts
- Next, let's bind our endpoint device with the target subsystem and port that we
- created::
- # echo 1 > nvmepf.0/nvme/portid
- # echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn
- The endpoint function can then be bound to the endpoint controller and the
- controller started::
- # cd /sys/kernel/config/pci_ep
- # ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/
- # echo 1 > controllers/a40000000.pcie-ep/start
- On the endpoint machine, kernel messages will show information as the NVMe
- target device and endpoint device are created and connected.
- .. code-block:: text
- null_blk: disk nullb0 created
- null_blk: module loaded
- nvmet: adding nsid 1 to subsystem nvmepf.0.nqn
- nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors
- nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176.
- nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B
- PCI Root-Complex Host
- ---------------------
- Booting the PCI host will result in the initialization of the PCIe link (this
- may be signaled by the PCI endpoint driver with a kernel message). A kernel
- message on the endpoint will also signal when the host NVMe driver enables the
- device controller::
- nvmet_pci_epf nvmet_pci_epf.0: Enabling controller
- On the host side, the NVMe PCI endpoint function target device is
- discoverable as a PCI device, with the vendor ID and device ID as configured::
- # lspci -n
- 0000:01:00.0 0108: 1b96:beef
- An this device will be recognized as an NVMe device with a single namespace::
- # lsblk
- NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
- nvme0n1 259:0 0 250G 0 disk
- The NVMe endpoint block device can then be used as any other regular NVMe
- namespace block device. The *nvme* command line utility can be used to get more
- detailed information about the endpoint device::
- # nvme id-ctrl /dev/nvme0
- NVME Identify Controller:
- vid : 0x1b96
- ssvid : 0x1b96
- sn : 94993c85650ef7bcd625
- mn : Linux-pci-epf
- fr : 6.13.0-r
- rab : 6
- ieee : 000000
- cmic : 0xb
- mdts : 7
- cntlid : 0x1
- ver : 0x20100
- ...
- Endpoint Bindings
- =================
- The NVMe PCI endpoint target driver uses the PCI endpoint configfs device
- attributes as follows.
- ================ ===========================================================
- vendorid Ignored (the vendor id of the NVMe target subsystem is used)
- deviceid Anything is OK (e.g. PCI_ANY_ID)
- revid Do not care
- progif_code Must be 0x02 (NVM Express)
- baseclass_code Must be 0x01 (PCI_BASE_CLASS_STORAGE)
- subclass_code Must be 0x08 (Non-Volatile Memory controller)
- cache_line_size Do not care
- subsys_vendor_id Ignored (the subsystem vendor id of the NVMe target subsystem
- is used)
- subsys_id Anything is OK (e.g. PCI_ANY_ID)
- msi_interrupts At least equal to the number of queue pairs desired
- msix_interrupts At least equal to the number of queue pairs desired
- interrupt_pin Interrupt PIN to use if MSI and MSI-X are not supported
- ================ ===========================================================
- The NVMe PCI endpoint target function also has some specific configurable
- fields defined in the *nvme* subdirectory of the function directory. These
- fields are as follows.
- ================ ===========================================================
- mdts_kb Maximum data transfer size in KiB (default: 512)
- portid The ID of the target port to use
- subsysnqn The NQN of the target subsystem to use
- ================ ===========================================================
|