bpf_prog_run.rst 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ===================================
  3. Running BPF programs from userspace
  4. ===================================
  5. This document describes the ``BPF_PROG_RUN`` facility for running BPF programs
  6. from userspace.
  7. .. contents::
  8. :local:
  9. :depth: 2
  10. Overview
  11. --------
  12. The ``BPF_PROG_RUN`` command can be used through the ``bpf()`` syscall to
  13. execute a BPF program in the kernel and return the results to userspace. This
  14. can be used to unit test BPF programs against user-supplied context objects, and
  15. as way to explicitly execute programs in the kernel for their side effects. The
  16. command was previously named ``BPF_PROG_TEST_RUN``, and both constants continue
  17. to be defined in the UAPI header, aliased to the same value.
  18. The ``BPF_PROG_RUN`` command can be used to execute BPF programs of the
  19. following types:
  20. - ``BPF_PROG_TYPE_SOCKET_FILTER``
  21. - ``BPF_PROG_TYPE_SCHED_CLS``
  22. - ``BPF_PROG_TYPE_SCHED_ACT``
  23. - ``BPF_PROG_TYPE_XDP``
  24. - ``BPF_PROG_TYPE_SK_LOOKUP``
  25. - ``BPF_PROG_TYPE_CGROUP_SKB``
  26. - ``BPF_PROG_TYPE_LWT_IN``
  27. - ``BPF_PROG_TYPE_LWT_OUT``
  28. - ``BPF_PROG_TYPE_LWT_XMIT``
  29. - ``BPF_PROG_TYPE_FLOW_DISSECTOR``
  30. - ``BPF_PROG_TYPE_STRUCT_OPS``
  31. - ``BPF_PROG_TYPE_RAW_TRACEPOINT``
  32. - ``BPF_PROG_TYPE_SYSCALL``
  33. - ``BPF_PROG_TYPE_TRACING``
  34. - ``BPF_PROG_TYPE_NETFILTER``
  35. When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
  36. object and (for program types operating on network packets) a buffer containing
  37. the packet data that the BPF program will operate on. The kernel will then
  38. execute the program and return the results to userspace. Note that programs will
  39. not have any side effects while being run in this mode; in particular, packets
  40. will not actually be redirected or dropped, the program return code will just be
  41. returned to userspace. A separate mode for live execution of XDP programs is
  42. provided, documented separately below.
  43. Running XDP programs in "live frame mode"
  44. -----------------------------------------
  45. The ``BPF_PROG_RUN`` command has a separate mode for running live XDP programs,
  46. which can be used to execute XDP programs in a way where packets will actually
  47. be processed by the kernel after the execution of the XDP program as if they
  48. arrived on a physical interface. This mode is activated by setting the
  49. ``BPF_F_TEST_XDP_LIVE_FRAMES`` flag when supplying an XDP program to
  50. ``BPF_PROG_RUN``.
  51. The live packet mode is optimised for high performance execution of the supplied
  52. XDP program many times (suitable for, e.g., running as a traffic generator),
  53. which means the semantics are not quite as straight-forward as the regular test
  54. run mode. Specifically:
  55. - When executing an XDP program in live frame mode, the result of the execution
  56. will not be returned to userspace; instead, the kernel will perform the
  57. operation indicated by the program's return code (drop the packet, redirect
  58. it, etc). For this reason, setting the ``data_out`` or ``ctx_out`` attributes
  59. in the syscall parameters when running in this mode will be rejected. In
  60. addition, not all failures will be reported back to userspace directly;
  61. specifically, only fatal errors in setup or during execution (like memory
  62. allocation errors) will halt execution and return an error. If an error occurs
  63. in packet processing, like a failure to redirect to a given interface,
  64. execution will continue with the next repetition; these errors can be detected
  65. via the same trace points as for regular XDP programs.
  66. - Userspace can supply an ifindex as part of the context object, just like in
  67. the regular (non-live) mode. The XDP program will be executed as though the
  68. packet arrived on this interface; i.e., the ``ingress_ifindex`` of the context
  69. object will point to that interface. Furthermore, if the XDP program returns
  70. ``XDP_PASS``, the packet will be injected into the kernel networking stack as
  71. though it arrived on that ifindex, and if it returns ``XDP_TX``, the packet
  72. will be transmitted *out* of that same interface. Do note, though, that
  73. because the program execution is not happening in driver context, an
  74. ``XDP_TX`` is actually turned into the same action as an ``XDP_REDIRECT`` to
  75. that same interface (i.e., it will only work if the driver has support for the
  76. ``ndo_xdp_xmit`` driver op).
  77. - When running the program with multiple repetitions, the execution will happen
  78. in batches. The batch size defaults to 64 packets (which is same as the
  79. maximum NAPI receive batch size), but can be specified by userspace through
  80. the ``batch_size`` parameter, up to a maximum of 256 packets. For each batch,
  81. the kernel executes the XDP program repeatedly, each invocation getting a
  82. separate copy of the packet data. For each repetition, if the program drops
  83. the packet, the data page is immediately recycled (see below). Otherwise, the
  84. packet is buffered until the end of the batch, at which point all packets
  85. buffered this way during the batch are transmitted at once.
  86. - When setting up the test run, the kernel will initialise a pool of memory
  87. pages of the same size as the batch size. Each memory page will be initialised
  88. with the initial packet data supplied by userspace at ``BPF_PROG_RUN``
  89. invocation. When possible, the pages will be recycled on future program
  90. invocations, to improve performance. Pages will generally be recycled a full
  91. batch at a time, except when a packet is dropped (by return code or because
  92. of, say, a redirection error), in which case that page will be recycled
  93. immediately. If a packet ends up being passed to the regular networking stack
  94. (because the XDP program returns ``XDP_PASS``, or because it ends up being
  95. redirected to an interface that injects it into the stack), the page will be
  96. released and a new one will be allocated when the pool is empty.
  97. When recycling, the page content is not rewritten; only the packet boundary
  98. pointers (``data``, ``data_end`` and ``data_meta``) in the context object will
  99. be reset to the original values. This means that if a program rewrites the
  100. packet contents, it has to be prepared to see either the original content or
  101. the modified version on subsequent invocations.