drm-vm-bind-async.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309
  1. .. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
  2. ====================
  3. Asynchronous VM_BIND
  4. ====================
  5. Nomenclature:
  6. =============
  7. * ``VRAM``: On-device memory. Sometimes referred to as device local memory.
  8. * ``gpu_vm``: A virtual GPU address space. Typically per process, but
  9. can be shared by multiple processes.
  10. * ``VM_BIND``: An operation or a list of operations to modify a gpu_vm using
  11. an IOCTL. The operations include mapping and unmapping system- or
  12. VRAM memory.
  13. * ``syncobj``: A container that abstracts synchronization objects. The
  14. synchronization objects can be either generic, like dma-fences or
  15. driver specific. A syncobj typically indicates the type of the
  16. underlying synchronization object.
  17. * ``in-syncobj``: Argument to a VM_BIND IOCTL, the VM_BIND operation waits
  18. for these before starting.
  19. * ``out-syncobj``: Argument to a VM_BIND_IOCTL, the VM_BIND operation
  20. signals these when the bind operation is complete.
  21. * ``dma-fence``: A cross-driver synchronization object. A basic
  22. understanding of dma-fences is required to digest this
  23. document. Please refer to the ``DMA Fences`` section of the
  24. :doc:`dma-buf doc </driver-api/dma-buf>`.
  25. * ``memory fence``: A synchronization object, different from a dma-fence.
  26. A memory fence uses the value of a specified memory location to determine
  27. signaled status. A memory fence can be awaited and signaled by both
  28. the GPU and CPU. Memory fences are sometimes referred to as
  29. user-fences, userspace-fences or gpu futexes and do not necessarily obey
  30. the dma-fence rule of signaling within a "reasonable amount of time".
  31. The kernel should thus avoid waiting for memory fences with locks held.
  32. * ``long-running workload``: A workload that may take more than the
  33. current stipulated dma-fence maximum signal delay to complete and
  34. which therefore needs to set the gpu_vm or the GPU execution context in
  35. a certain mode that disallows completion dma-fences.
  36. * ``exec function``: An exec function is a function that revalidates all
  37. affected gpu_vmas, submits a GPU command batch and registers the
  38. dma_fence representing the GPU command's activity with all affected
  39. dma_resvs. For completeness, although not covered by this document,
  40. it's worth mentioning that an exec function may also be the
  41. revalidation worker that is used by some drivers in compute /
  42. long-running mode.
  43. * ``bind context``: A context identifier used for the VM_BIND
  44. operation. VM_BIND operations that use the same bind context can be
  45. assumed, where it matters, to complete in order of submission. No such
  46. assumptions can be made for VM_BIND operations using separate bind contexts.
  47. * ``UMD``: User-mode driver.
  48. * ``KMD``: Kernel-mode driver.
  49. Synchronous / Asynchronous VM_BIND operation
  50. ============================================
  51. Synchronous VM_BIND
  52. ___________________
  53. With Synchronous VM_BIND, the VM_BIND operations all complete before the
  54. IOCTL returns. A synchronous VM_BIND takes neither in-fences nor
  55. out-fences. Synchronous VM_BIND may block and wait for GPU operations;
  56. for example swap-in or clearing, or even previous binds.
  57. Asynchronous VM_BIND
  58. ____________________
  59. Asynchronous VM_BIND accepts both in-syncobjs and out-syncobjs. While the
  60. IOCTL may return immediately, the VM_BIND operations wait for the in-syncobjs
  61. before modifying the GPU page-tables, and signal the out-syncobjs when
  62. the modification is done in the sense that the next exec function that
  63. awaits for the out-syncobjs will see the change. Errors are reported
  64. synchronously.
  65. In low-memory situations the implementation may block, performing the
  66. VM_BIND synchronously, because there might not be enough memory
  67. immediately available for preparing the asynchronous operation.
  68. If the VM_BIND IOCTL takes a list or an array of operations as an argument,
  69. the in-syncobjs needs to signal before the first operation starts to
  70. execute, and the out-syncobjs signal after the last operation
  71. completes. Operations in the operation list can be assumed, where it
  72. matters, to complete in order.
  73. Since asynchronous VM_BIND operations may use dma-fences embedded in
  74. out-syncobjs and internally in KMD to signal bind completion, any
  75. memory fences given as VM_BIND in-fences need to be awaited
  76. synchronously before the VM_BIND ioctl returns, since dma-fences,
  77. required to signal in a reasonable amount of time, can never be made
  78. to depend on memory fences that don't have such a restriction.
  79. The purpose of an Asynchronous VM_BIND operation is for user-mode
  80. drivers to be able to pipeline interleaved gpu_vm modifications and
  81. exec functions. For long-running workloads, such pipelining of a bind
  82. operation is not allowed and any in-fences need to be awaited
  83. synchronously. The reason for this is twofold. First, any memory
  84. fences gated by a long-running workload and used as in-syncobjs for the
  85. VM_BIND operation will need to be awaited synchronously anyway (see
  86. above). Second, any dma-fences used as in-syncobjs for VM_BIND
  87. operations for long-running workloads will not allow for pipelining
  88. anyway since long-running workloads don't allow for dma-fences as
  89. out-syncobjs, so while theoretically possible the use of them is
  90. questionable and should be rejected until there is a valuable use-case.
  91. Note that this is not a limitation imposed by dma-fence rules, but
  92. rather a limitation imposed to keep KMD implementation simple. It does
  93. not affect using dma-fences as dependencies for the long-running
  94. workload itself, which is allowed by dma-fence rules, but rather for
  95. the VM_BIND operation only.
  96. An asynchronous VM_BIND operation may take substantial time to
  97. complete and signal the out_fence. In particular if the operation is
  98. deeply pipelined behind other VM_BIND operations and workloads
  99. submitted using exec functions. In that case, UMD might want to avoid a
  100. subsequent VM_BIND operation to be queued behind the first one if
  101. there are no explicit dependencies. In order to circumvent such a queue-up, a
  102. VM_BIND implementation may allow for VM_BIND contexts to be
  103. created. For each context, VM_BIND operations will be guaranteed to
  104. complete in the order they were submitted, but that is not the case
  105. for VM_BIND operations executing on separate VM_BIND contexts. Instead
  106. KMD will attempt to execute such VM_BIND operations in parallel but
  107. leaving no guarantee that they will actually be executed in
  108. parallel. There may be internal implicit dependencies that only KMD knows
  109. about, for example page-table structure changes. A way to attempt
  110. to avoid such internal dependencies is to have different VM_BIND
  111. contexts use separate regions of a VM.
  112. Also for VM_BINDS for long-running gpu_vms the user-mode driver should typically
  113. select memory fences as out-fences since that gives greater flexibility for
  114. the kernel mode driver to inject other operations into the bind /
  115. unbind operations. Like for example inserting breakpoints into batch
  116. buffers. The workload execution can then easily be pipelined behind
  117. the bind completion using the memory out-fence as the signal condition
  118. for a GPU semaphore embedded by UMD in the workload.
  119. There is no difference in the operations supported or in
  120. multi-operation support between asynchronous VM_BIND and synchronous VM_BIND.
  121. Multi-operation VM_BIND IOCTL error handling and interrupts
  122. ===========================================================
  123. The VM_BIND operations of the IOCTL may error for various reasons, for
  124. example due to lack of resources to complete and due to interrupted
  125. waits.
  126. In these situations UMD should preferably restart the IOCTL after
  127. taking suitable action.
  128. If UMD has over-committed a memory resource, an -ENOSPC error will be
  129. returned, and UMD may then unbind resources that are not used at the
  130. moment and rerun the IOCTL. On -EINTR, UMD should simply rerun the
  131. IOCTL and on -ENOMEM user-space may either attempt to free known
  132. system memory resources or fail. In case of UMD deciding to fail a
  133. bind operation, due to an error return, no additional action is needed
  134. to clean up the failed operation, and the VM is left in the same state
  135. as it was before the failing IOCTL.
  136. Unbind operations are guaranteed not to return any errors due to
  137. resource constraints, but may return errors due to, for example,
  138. invalid arguments or the gpu_vm being banned.
  139. In the case an unexpected error happens during the asynchronous bind
  140. process, the gpu_vm will be banned, and attempts to use it after banning
  141. will return -ENOENT.
  142. Example: The Xe VM_BIND uAPI
  143. ============================
  144. Starting with the VM_BIND operation struct, the IOCTL call can take
  145. zero, one or many such operations. A zero number means only the
  146. synchronization part of the IOCTL is carried out: an asynchronous
  147. VM_BIND updates the syncobjects, whereas a sync VM_BIND waits for the
  148. implicit dependencies to be fulfilled.
  149. .. code-block:: c
  150. struct drm_xe_vm_bind_op {
  151. /**
  152. * @obj: GEM object to operate on, MBZ for MAP_USERPTR, MBZ for UNMAP
  153. */
  154. __u32 obj;
  155. /** @pad: MBZ */
  156. __u32 pad;
  157. union {
  158. /**
  159. * @obj_offset: Offset into the object for MAP.
  160. */
  161. __u64 obj_offset;
  162. /** @userptr: user virtual address for MAP_USERPTR */
  163. __u64 userptr;
  164. };
  165. /**
  166. * @range: Number of bytes from the object to bind to addr, MBZ for UNMAP_ALL
  167. */
  168. __u64 range;
  169. /** @addr: Address to operate on, MBZ for UNMAP_ALL */
  170. __u64 addr;
  171. /**
  172. * @tile_mask: Mask for which tiles to create binds for, 0 == All tiles,
  173. * only applies to creating new VMAs
  174. */
  175. __u64 tile_mask;
  176. /* Map (parts of) an object into the GPU virtual address range.
  177. #define XE_VM_BIND_OP_MAP 0x0
  178. /* Unmap a GPU virtual address range */
  179. #define XE_VM_BIND_OP_UNMAP 0x1
  180. /*
  181. * Map a CPU virtual address range into a GPU virtual
  182. * address range.
  183. */
  184. #define XE_VM_BIND_OP_MAP_USERPTR 0x2
  185. /* Unmap a gem object from the VM. */
  186. #define XE_VM_BIND_OP_UNMAP_ALL 0x3
  187. /*
  188. * Make the backing memory of an address range resident if
  189. * possible. Note that this doesn't pin backing memory.
  190. */
  191. #define XE_VM_BIND_OP_PREFETCH 0x4
  192. /* Make the GPU map readonly. */
  193. #define XE_VM_BIND_FLAG_READONLY (0x1 << 16)
  194. /*
  195. * Valid on a faulting VM only, do the MAP operation immediately rather
  196. * than deferring the MAP to the page fault handler.
  197. */
  198. #define XE_VM_BIND_FLAG_IMMEDIATE (0x1 << 17)
  199. /*
  200. * When the NULL flag is set, the page tables are setup with a special
  201. * bit which indicates writes are dropped and all reads return zero. In
  202. * the future, the NULL flags will only be valid for XE_VM_BIND_OP_MAP
  203. * operations, the BO handle MBZ, and the BO offset MBZ. This flag is
  204. * intended to implement VK sparse bindings.
  205. */
  206. #define XE_VM_BIND_FLAG_NULL (0x1 << 18)
  207. /** @op: Operation to perform (lower 16 bits) and flags (upper 16 bits) */
  208. __u32 op;
  209. /** @mem_region: Memory region to prefetch VMA to, instance not a mask */
  210. __u32 region;
  211. /** @reserved: Reserved */
  212. __u64 reserved[2];
  213. };
  214. The VM_BIND IOCTL argument itself, looks like follows. Note that for
  215. synchronous VM_BIND, the num_syncs and syncs fields must be zero. Here
  216. the ``exec_queue_id`` field is the VM_BIND context discussed previously
  217. that is used to facilitate out-of-order VM_BINDs.
  218. .. code-block:: c
  219. struct drm_xe_vm_bind {
  220. /** @extensions: Pointer to the first extension struct, if any */
  221. __u64 extensions;
  222. /** @vm_id: The ID of the VM to bind to */
  223. __u32 vm_id;
  224. /**
  225. * @exec_queue_id: exec_queue_id, must be of class DRM_XE_ENGINE_CLASS_VM_BIND
  226. * and exec queue must have same vm_id. If zero, the default VM bind engine
  227. * is used.
  228. */
  229. __u32 exec_queue_id;
  230. /** @num_binds: number of binds in this IOCTL */
  231. __u32 num_binds;
  232. /* If set, perform an async VM_BIND, if clear a sync VM_BIND */
  233. #define XE_VM_BIND_IOCTL_FLAG_ASYNC (0x1 << 0)
  234. /** @flag: Flags controlling all operations in this ioctl. */
  235. __u32 flags;
  236. union {
  237. /** @bind: used if num_binds == 1 */
  238. struct drm_xe_vm_bind_op bind;
  239. /**
  240. * @vector_of_binds: userptr to array of struct
  241. * drm_xe_vm_bind_op if num_binds > 1
  242. */
  243. __u64 vector_of_binds;
  244. };
  245. /** @num_syncs: amount of syncs to wait for or to signal on completion. */
  246. __u32 num_syncs;
  247. /** @pad2: MBZ */
  248. __u32 pad2;
  249. /** @syncs: pointer to struct drm_xe_sync array */
  250. __u64 syncs;
  251. /** @reserved: Reserved */
  252. __u64 reserved[2];
  253. };