instruction-set.rst 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790
  1. .. contents::
  2. .. sectnum::
  3. ======================================
  4. BPF Instruction Set Architecture (ISA)
  5. ======================================
  6. eBPF, also commonly
  7. referred to as BPF, is a technology with origins in the Linux kernel
  8. that can run untrusted programs in a privileged context such as an
  9. operating system kernel. This document specifies the BPF instruction
  10. set architecture (ISA).
  11. As a historical note, BPF originally stood for Berkeley Packet Filter,
  12. but now that it can do so much more than packet filtering, the acronym
  13. no longer makes sense. BPF is now considered a standalone term that
  14. does not stand for anything. The original BPF is sometimes referred to
  15. as cBPF (classic BPF) to distinguish it from the now widely deployed
  16. eBPF (extended BPF).
  17. Documentation conventions
  18. =========================
  19. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  20. "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
  21. "OPTIONAL" in this document are to be interpreted as described in
  22. BCP 14 `<https://www.rfc-editor.org/info/rfc2119>`_
  23. `<https://www.rfc-editor.org/info/rfc8174>`_
  24. when, and only when, they appear in all capitals, as shown here.
  25. For brevity and consistency, this document refers to families
  26. of types using a shorthand syntax and refers to several expository,
  27. mnemonic functions when describing the semantics of instructions.
  28. The range of valid values for those types and the semantics of those
  29. functions are defined in the following subsections.
  30. Types
  31. -----
  32. This document refers to integer types with the notation `SN` to specify
  33. a type's signedness (`S`) and bit width (`N`), respectively.
  34. .. table:: Meaning of signedness notation
  35. ==== =========
  36. S Meaning
  37. ==== =========
  38. u unsigned
  39. s signed
  40. ==== =========
  41. .. table:: Meaning of bit-width notation
  42. ===== =========
  43. N Bit width
  44. ===== =========
  45. 8 8 bits
  46. 16 16 bits
  47. 32 32 bits
  48. 64 64 bits
  49. 128 128 bits
  50. ===== =========
  51. For example, `u32` is a type whose valid values are all the 32-bit unsigned
  52. numbers and `s16` is a type whose valid values are all the 16-bit signed
  53. numbers.
  54. Functions
  55. ---------
  56. The following byteswap functions are direction-agnostic. That is,
  57. the same function is used for conversion in either direction discussed
  58. below.
  59. * be16: Takes an unsigned 16-bit number and converts it between
  60. host byte order and big-endian
  61. (`IEN137 <https://www.rfc-editor.org/ien/ien137.txt>`_) byte order.
  62. * be32: Takes an unsigned 32-bit number and converts it between
  63. host byte order and big-endian byte order.
  64. * be64: Takes an unsigned 64-bit number and converts it between
  65. host byte order and big-endian byte order.
  66. * bswap16: Takes an unsigned 16-bit number in either big- or little-endian
  67. format and returns the equivalent number with the same bit width but
  68. opposite endianness.
  69. * bswap32: Takes an unsigned 32-bit number in either big- or little-endian
  70. format and returns the equivalent number with the same bit width but
  71. opposite endianness.
  72. * bswap64: Takes an unsigned 64-bit number in either big- or little-endian
  73. format and returns the equivalent number with the same bit width but
  74. opposite endianness.
  75. * le16: Takes an unsigned 16-bit number and converts it between
  76. host byte order and little-endian byte order.
  77. * le32: Takes an unsigned 32-bit number and converts it between
  78. host byte order and little-endian byte order.
  79. * le64: Takes an unsigned 64-bit number and converts it between
  80. host byte order and little-endian byte order.
  81. Definitions
  82. -----------
  83. .. glossary::
  84. Sign Extend
  85. To `sign extend an` ``X`` `-bit number, A, to a` ``Y`` `-bit number, B ,` means to
  86. #. Copy all ``X`` bits from `A` to the lower ``X`` bits of `B`.
  87. #. Set the value of the remaining ``Y`` - ``X`` bits of `B` to the value of
  88. the most-significant bit of `A`.
  89. .. admonition:: Example
  90. Sign extend an 8-bit number ``A`` to a 16-bit number ``B`` on a big-endian platform:
  91. ::
  92. A: 10000110
  93. B: 11111111 10000110
  94. Conformance groups
  95. ------------------
  96. An implementation does not need to support all instructions specified in this
  97. document (e.g., deprecated instructions). Instead, a number of conformance
  98. groups are specified. An implementation MUST support the base32 conformance
  99. group and MAY support additional conformance groups, where supporting a
  100. conformance group means it MUST support all instructions in that conformance
  101. group.
  102. The use of named conformance groups enables interoperability between a runtime
  103. that executes instructions, and tools such as compilers that generate
  104. instructions for the runtime. Thus, capability discovery in terms of
  105. conformance groups might be done manually by users or automatically by tools.
  106. Each conformance group has a short ASCII label (e.g., "base32") that
  107. corresponds to a set of instructions that are mandatory. That is, each
  108. instruction has one or more conformance groups of which it is a member.
  109. This document defines the following conformance groups:
  110. * base32: includes all instructions defined in this
  111. specification unless otherwise noted.
  112. * base64: includes base32, plus instructions explicitly noted
  113. as being in the base64 conformance group.
  114. * atomic32: includes 32-bit atomic operation instructions (see `Atomic operations`_).
  115. * atomic64: includes atomic32, plus 64-bit atomic operation instructions.
  116. * divmul32: includes 32-bit division, multiplication, and modulo instructions.
  117. * divmul64: includes divmul32, plus 64-bit division, multiplication,
  118. and modulo instructions.
  119. * packet: deprecated packet access instructions.
  120. Instruction encoding
  121. ====================
  122. BPF has two instruction encodings:
  123. * the basic instruction encoding, which uses 64 bits to encode an instruction
  124. * the wide instruction encoding, which appends a second 64 bits
  125. after the basic instruction for a total of 128 bits.
  126. Basic instruction encoding
  127. --------------------------
  128. A basic instruction is encoded as follows::
  129. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  130. | opcode | regs | offset |
  131. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  132. | imm |
  133. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  134. **opcode**
  135. operation to perform, encoded as follows::
  136. +-+-+-+-+-+-+-+-+
  137. |specific |class|
  138. +-+-+-+-+-+-+-+-+
  139. **specific**
  140. The format of these bits varies by instruction class
  141. **class**
  142. The instruction class (see `Instruction classes`_)
  143. **regs**
  144. The source and destination register numbers, encoded as follows
  145. on a little-endian host::
  146. +-+-+-+-+-+-+-+-+
  147. |src_reg|dst_reg|
  148. +-+-+-+-+-+-+-+-+
  149. and as follows on a big-endian host::
  150. +-+-+-+-+-+-+-+-+
  151. |dst_reg|src_reg|
  152. +-+-+-+-+-+-+-+-+
  153. **src_reg**
  154. the source register number (0-10), except where otherwise specified
  155. (`64-bit immediate instructions`_ reuse this field for other purposes)
  156. **dst_reg**
  157. destination register number (0-10), unless otherwise specified
  158. (future instructions might reuse this field for other purposes)
  159. **offset**
  160. signed integer offset used with pointer arithmetic, except where
  161. otherwise specified (some arithmetic instructions reuse this field
  162. for other purposes)
  163. **imm**
  164. signed integer immediate value
  165. Note that the contents of multi-byte fields ('offset' and 'imm') are
  166. stored using big-endian byte ordering on big-endian hosts and
  167. little-endian byte ordering on little-endian hosts.
  168. For example::
  169. opcode offset imm assembly
  170. src_reg dst_reg
  171. 07 0 1 00 00 44 33 22 11 r1 += 0x11223344 // little
  172. dst_reg src_reg
  173. 07 1 0 00 00 11 22 33 44 r1 += 0x11223344 // big
  174. Note that most instructions do not use all of the fields.
  175. Unused fields SHALL be cleared to zero.
  176. Wide instruction encoding
  177. --------------------------
  178. Some instructions are defined to use the wide instruction encoding,
  179. which uses two 32-bit immediate values. The 64 bits following
  180. the basic instruction format contain a pseudo instruction
  181. with 'opcode', 'dst_reg', 'src_reg', and 'offset' all set to zero.
  182. This is depicted in the following figure::
  183. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  184. | opcode | regs | offset |
  185. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  186. | imm |
  187. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  188. | reserved |
  189. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  190. | next_imm |
  191. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  192. **opcode**
  193. operation to perform, encoded as explained above
  194. **regs**
  195. The source and destination register numbers (unless otherwise
  196. specified), encoded as explained above
  197. **offset**
  198. signed integer offset used with pointer arithmetic, unless
  199. otherwise specified
  200. **imm**
  201. signed integer immediate value
  202. **reserved**
  203. unused, set to zero
  204. **next_imm**
  205. second signed integer immediate value
  206. Instruction classes
  207. -------------------
  208. The three least significant bits of the 'opcode' field store the instruction class:
  209. .. table:: Instruction class
  210. ===== ===== =============================== ===================================
  211. class value description reference
  212. ===== ===== =============================== ===================================
  213. LD 0x0 non-standard load operations `Load and store instructions`_
  214. LDX 0x1 load into register operations `Load and store instructions`_
  215. ST 0x2 store from immediate operations `Load and store instructions`_
  216. STX 0x3 store from register operations `Load and store instructions`_
  217. ALU 0x4 32-bit arithmetic operations `Arithmetic and jump instructions`_
  218. JMP 0x5 64-bit jump operations `Arithmetic and jump instructions`_
  219. JMP32 0x6 32-bit jump operations `Arithmetic and jump instructions`_
  220. ALU64 0x7 64-bit arithmetic operations `Arithmetic and jump instructions`_
  221. ===== ===== =============================== ===================================
  222. Arithmetic and jump instructions
  223. ================================
  224. For arithmetic and jump instructions (``ALU``, ``ALU64``, ``JMP`` and
  225. ``JMP32``), the 8-bit 'opcode' field is divided into three parts::
  226. +-+-+-+-+-+-+-+-+
  227. | code |s|class|
  228. +-+-+-+-+-+-+-+-+
  229. **code**
  230. the operation code, whose meaning varies by instruction class
  231. **s (source)**
  232. the source operand location, which unless otherwise specified is one of:
  233. .. table:: Source operand location
  234. ====== ===== ==============================================
  235. source value description
  236. ====== ===== ==============================================
  237. K 0 use 32-bit 'imm' value as source operand
  238. X 1 use 'src_reg' register value as source operand
  239. ====== ===== ==============================================
  240. **instruction class**
  241. the instruction class (see `Instruction classes`_)
  242. Arithmetic instructions
  243. -----------------------
  244. ``ALU`` uses 32-bit wide operands while ``ALU64`` uses 64-bit wide operands for
  245. otherwise identical operations. ``ALU64`` instructions belong to the
  246. base64 conformance group unless noted otherwise.
  247. The 'code' field encodes the operation as below, where 'src' refers to the
  248. the source operand and 'dst' refers to the value of the destination
  249. register.
  250. .. table:: Arithmetic instructions
  251. ===== ===== ======= ===================================================================================
  252. name code offset description
  253. ===== ===== ======= ===================================================================================
  254. ADD 0x0 0 dst += src
  255. SUB 0x1 0 dst -= src
  256. MUL 0x2 0 dst \*= src
  257. DIV 0x3 0 dst = (src != 0) ? (dst / src) : 0
  258. SDIV 0x3 1 dst = (src == 0) ? 0 : ((src == -1 && dst == LLONG_MIN) ? LLONG_MIN : (dst s/ src))
  259. OR 0x4 0 dst \|= src
  260. AND 0x5 0 dst &= src
  261. LSH 0x6 0 dst <<= (src & mask)
  262. RSH 0x7 0 dst >>= (src & mask)
  263. NEG 0x8 0 dst = -dst
  264. MOD 0x9 0 dst = (src != 0) ? (dst % src) : dst
  265. SMOD 0x9 1 dst = (src == 0) ? dst : ((src == -1 && dst == LLONG_MIN) ? 0: (dst s% src))
  266. XOR 0xa 0 dst ^= src
  267. MOV 0xb 0 dst = src
  268. MOVSX 0xb 8/16/32 dst = (s8,s16,s32)src
  269. ARSH 0xc 0 :term:`sign extending<Sign Extend>` dst >>= (src & mask)
  270. END 0xd 0 byte swap operations (see `Byte swap instructions`_ below)
  271. ===== ===== ======= ===================================================================================
  272. Underflow and overflow are allowed during arithmetic operations, meaning
  273. the 64-bit or 32-bit value will wrap. If BPF program execution would
  274. result in division by zero, the destination register is instead set to zero.
  275. Otherwise, for ``ALU64``, if execution would result in ``LLONG_MIN``
  276. divided by -1, the destination register is instead set to ``LLONG_MIN``. For
  277. ``ALU``, if execution would result in ``INT_MIN`` divided by -1, the
  278. destination register is instead set to ``INT_MIN``.
  279. If execution would result in modulo by zero, for ``ALU64`` the value of
  280. the destination register is unchanged whereas for ``ALU`` the upper
  281. 32 bits of the destination register are zeroed. Otherwise, for ``ALU64``,
  282. if execution would resuslt in ``LLONG_MIN`` modulo -1, the destination
  283. register is instead set to 0. For ``ALU``, if execution would result in
  284. ``INT_MIN`` modulo -1, the destination register is instead set to 0.
  285. ``{ADD, X, ALU}``, where 'code' = ``ADD``, 'source' = ``X``, and 'class' = ``ALU``, means::
  286. dst = (u32) ((u32) dst + (u32) src)
  287. where '(u32)' indicates that the upper 32 bits are zeroed.
  288. ``{ADD, X, ALU64}`` means::
  289. dst = dst + src
  290. ``{XOR, K, ALU}`` means::
  291. dst = (u32) dst ^ (u32) imm
  292. ``{XOR, K, ALU64}`` means::
  293. dst = dst ^ imm
  294. Note that most arithmetic instructions have 'offset' set to 0. Only three instructions
  295. (``SDIV``, ``SMOD``, ``MOVSX``) have a non-zero 'offset'.
  296. Division, multiplication, and modulo operations for ``ALU`` are part
  297. of the "divmul32" conformance group, and division, multiplication, and
  298. modulo operations for ``ALU64`` are part of the "divmul64" conformance
  299. group.
  300. The division and modulo operations support both unsigned and signed flavors.
  301. For unsigned operations (``DIV`` and ``MOD``), for ``ALU``,
  302. 'imm' is interpreted as a 32-bit unsigned value. For ``ALU64``,
  303. 'imm' is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
  304. interpreted as a 64-bit unsigned value.
  305. For signed operations (``SDIV`` and ``SMOD``), for ``ALU``,
  306. 'imm' is interpreted as a 32-bit signed value. For ``ALU64``, 'imm'
  307. is first :term:`sign extended<Sign Extend>` from 32 to 64 bits, and then
  308. interpreted as a 64-bit signed value.
  309. Note that there are varying definitions of the signed modulo operation
  310. when the dividend or divisor are negative, where implementations often
  311. vary by language such that Python, Ruby, etc. differ from C, Go, Java,
  312. etc. This specification requires that signed modulo MUST use truncated division
  313. (where -13 % 3 == -1) as implemented in C, Go, etc.::
  314. a % n = a - n * trunc(a / n)
  315. The ``MOVSX`` instruction does a move operation with sign extension.
  316. ``{MOVSX, X, ALU}`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into
  317. 32-bit operands, and zeroes the remaining upper 32 bits.
  318. ``{MOVSX, X, ALU64}`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit
  319. operands into 64-bit operands. Unlike other arithmetic instructions,
  320. ``MOVSX`` is only defined for register source operands (``X``).
  321. ``{MOV, K, ALU64}`` means::
  322. dst = (s64)imm
  323. ``{MOV, X, ALU}`` means::
  324. dst = (u32)src
  325. ``{MOVSX, X, ALU}`` with 'offset' 8 means::
  326. dst = (u32)(s32)(s8)src
  327. The ``NEG`` instruction is only defined when the source bit is clear
  328. (``K``).
  329. Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31)
  330. for 32-bit operations.
  331. Byte swap instructions
  332. ----------------------
  333. The byte swap instructions use instruction classes of ``ALU`` and ``ALU64``
  334. and a 4-bit 'code' field of ``END``.
  335. The byte swap instructions operate on the destination register
  336. only and do not use a separate source register or immediate value.
  337. For ``ALU``, the 1-bit source operand field in the opcode is used to
  338. select what byte order the operation converts from or to. For
  339. ``ALU64``, the 1-bit source operand field in the opcode is reserved
  340. and MUST be set to 0.
  341. .. table:: Byte swap instructions
  342. ===== ======== ===== =================================================
  343. class source value description
  344. ===== ======== ===== =================================================
  345. ALU LE 0 convert between host byte order and little endian
  346. ALU BE 1 convert between host byte order and big endian
  347. ALU64 Reserved 0 do byte swap unconditionally
  348. ===== ======== ===== =================================================
  349. The 'imm' field encodes the width of the swap operations. The following widths
  350. are supported: 16, 32 and 64. Width 64 operations belong to the base64
  351. conformance group and other swap operations belong to the base32
  352. conformance group.
  353. Examples:
  354. ``{END, LE, ALU}`` with 'imm' = 16/32/64 means::
  355. dst = le16(dst)
  356. dst = le32(dst)
  357. dst = le64(dst)
  358. ``{END, BE, ALU}`` with 'imm' = 16/32/64 means::
  359. dst = be16(dst)
  360. dst = be32(dst)
  361. dst = be64(dst)
  362. ``{END, TO, ALU64}`` with 'imm' = 16/32/64 means::
  363. dst = bswap16(dst)
  364. dst = bswap32(dst)
  365. dst = bswap64(dst)
  366. Jump instructions
  367. -----------------
  368. ``JMP32`` uses 32-bit wide operands and indicates the base32
  369. conformance group, while ``JMP`` uses 64-bit wide operands for
  370. otherwise identical operations, and indicates the base64 conformance
  371. group unless otherwise specified.
  372. The 'code' field encodes the operation as below:
  373. .. table:: Jump instructions
  374. ======== ===== ======= ================================= ===================================================
  375. code value src_reg description notes
  376. ======== ===== ======= ================================= ===================================================
  377. JA 0x0 0x0 PC += offset {JA, K, JMP} only
  378. JA 0x0 0x0 PC += imm {JA, K, JMP32} only
  379. JEQ 0x1 any PC += offset if dst == src
  380. JGT 0x2 any PC += offset if dst > src unsigned
  381. JGE 0x3 any PC += offset if dst >= src unsigned
  382. JSET 0x4 any PC += offset if dst & src
  383. JNE 0x5 any PC += offset if dst != src
  384. JSGT 0x6 any PC += offset if dst > src signed
  385. JSGE 0x7 any PC += offset if dst >= src signed
  386. CALL 0x8 0x0 call helper function by static ID {CALL, K, JMP} only, see `Helper functions`_
  387. CALL 0x8 0x1 call PC += imm {CALL, K, JMP} only, see `Program-local functions`_
  388. CALL 0x8 0x2 call helper function by BTF ID {CALL, K, JMP} only, see `Helper functions`_
  389. EXIT 0x9 0x0 return {CALL, K, JMP} only
  390. JLT 0xa any PC += offset if dst < src unsigned
  391. JLE 0xb any PC += offset if dst <= src unsigned
  392. JSLT 0xc any PC += offset if dst < src signed
  393. JSLE 0xd any PC += offset if dst <= src signed
  394. ======== ===== ======= ================================= ===================================================
  395. where 'PC' denotes the program counter, and the offset to increment by
  396. is in units of 64-bit instructions relative to the instruction following
  397. the jump instruction. Thus 'PC += 1' skips execution of the next
  398. instruction if it's a basic instruction or results in undefined behavior
  399. if the next instruction is a 128-bit wide instruction.
  400. Example:
  401. ``{JSGE, X, JMP32}`` means::
  402. if (s32)dst s>= (s32)src goto +offset
  403. where 's>=' indicates a signed '>=' comparison.
  404. ``{JLE, K, JMP}`` means::
  405. if dst <= (u64)(s64)imm goto +offset
  406. ``{JA, K, JMP32}`` means::
  407. gotol +imm
  408. where 'imm' means the branch offset comes from the 'imm' field.
  409. Note that there are two flavors of ``JA`` instructions. The
  410. ``JMP`` class permits a 16-bit jump offset specified by the 'offset'
  411. field, whereas the ``JMP32`` class permits a 32-bit jump offset
  412. specified by the 'imm' field. A > 16-bit conditional jump may be
  413. converted to a < 16-bit conditional jump plus a 32-bit unconditional
  414. jump.
  415. All ``CALL`` and ``JA`` instructions belong to the
  416. base32 conformance group.
  417. Helper functions
  418. ~~~~~~~~~~~~~~~~
  419. Helper functions are a concept whereby BPF programs can call into a
  420. set of function calls exposed by the underlying platform.
  421. Historically, each helper function was identified by a static ID
  422. encoded in the 'imm' field. Further documentation of helper functions
  423. is outside the scope of this document and standardization is left for
  424. future work, but use is widely deployed and more information can be
  425. found in platform-specific documentation (e.g., Linux kernel documentation).
  426. Platforms that support the BPF Type Format (BTF) support identifying
  427. a helper function by a BTF ID encoded in the 'imm' field, where the BTF ID
  428. identifies the helper name and type. Further documentation of BTF
  429. is outside the scope of this document and standardization is left for
  430. future work, but use is widely deployed and more information can be
  431. found in platform-specific documentation (e.g., Linux kernel documentation).
  432. Program-local functions
  433. ~~~~~~~~~~~~~~~~~~~~~~~
  434. Program-local functions are functions exposed by the same BPF program as the
  435. caller, and are referenced by offset from the instruction following the call
  436. instruction, similar to ``JA``. The offset is encoded in the 'imm' field of
  437. the call instruction. An ``EXIT`` within the program-local function will
  438. return to the caller.
  439. Load and store instructions
  440. ===========================
  441. For load and store instructions (``LD``, ``LDX``, ``ST``, and ``STX``), the
  442. 8-bit 'opcode' field is divided as follows::
  443. +-+-+-+-+-+-+-+-+
  444. |mode |sz |class|
  445. +-+-+-+-+-+-+-+-+
  446. **mode**
  447. The mode modifier is one of:
  448. .. table:: Mode modifier
  449. ============= ===== ==================================== =============
  450. mode modifier value description reference
  451. ============= ===== ==================================== =============
  452. IMM 0 64-bit immediate instructions `64-bit immediate instructions`_
  453. ABS 1 legacy BPF packet access (absolute) `Legacy BPF Packet access instructions`_
  454. IND 2 legacy BPF packet access (indirect) `Legacy BPF Packet access instructions`_
  455. MEM 3 regular load and store operations `Regular load and store operations`_
  456. MEMSX 4 sign-extension load operations `Sign-extension load operations`_
  457. ATOMIC 6 atomic operations `Atomic operations`_
  458. ============= ===== ==================================== =============
  459. **sz (size)**
  460. The size modifier is one of:
  461. .. table:: Size modifier
  462. ==== ===== =====================
  463. size value description
  464. ==== ===== =====================
  465. W 0 word (4 bytes)
  466. H 1 half word (2 bytes)
  467. B 2 byte
  468. DW 3 double word (8 bytes)
  469. ==== ===== =====================
  470. Instructions using ``DW`` belong to the base64 conformance group.
  471. **class**
  472. The instruction class (see `Instruction classes`_)
  473. Regular load and store operations
  474. ---------------------------------
  475. The ``MEM`` mode modifier is used to encode regular load and store
  476. instructions that transfer data between a register and memory.
  477. ``{MEM, <size>, STX}`` means::
  478. *(size *) (dst + offset) = src
  479. ``{MEM, <size>, ST}`` means::
  480. *(size *) (dst + offset) = imm
  481. ``{MEM, <size>, LDX}`` means::
  482. dst = *(unsigned size *) (src + offset)
  483. Where '<size>' is one of: ``B``, ``H``, ``W``, or ``DW``, and
  484. 'unsigned size' is one of: u8, u16, u32, or u64.
  485. Sign-extension load operations
  486. ------------------------------
  487. The ``MEMSX`` mode modifier is used to encode :term:`sign-extension<Sign Extend>` load
  488. instructions that transfer data between a register and memory.
  489. ``{MEMSX, <size>, LDX}`` means::
  490. dst = *(signed size *) (src + offset)
  491. Where '<size>' is one of: ``B``, ``H``, or ``W``, and
  492. 'signed size' is one of: s8, s16, or s32.
  493. Atomic operations
  494. -----------------
  495. Atomic operations are operations that operate on memory and can not be
  496. interrupted or corrupted by other access to the same memory region
  497. by other BPF programs or means outside of this specification.
  498. All atomic operations supported by BPF are encoded as store operations
  499. that use the ``ATOMIC`` mode modifier as follows:
  500. * ``{ATOMIC, W, STX}`` for 32-bit operations, which are
  501. part of the "atomic32" conformance group.
  502. * ``{ATOMIC, DW, STX}`` for 64-bit operations, which are
  503. part of the "atomic64" conformance group.
  504. * 8-bit and 16-bit wide atomic operations are not supported.
  505. The 'imm' field is used to encode the actual atomic operation.
  506. Simple atomic operation use a subset of the values defined to encode
  507. arithmetic operations in the 'imm' field to encode the atomic operation:
  508. .. table:: Simple atomic operations
  509. ======== ===== ===========
  510. imm value description
  511. ======== ===== ===========
  512. ADD 0x00 atomic add
  513. OR 0x40 atomic or
  514. AND 0x50 atomic and
  515. XOR 0xa0 atomic xor
  516. ======== ===== ===========
  517. ``{ATOMIC, W, STX}`` with 'imm' = ADD means::
  518. *(u32 *)(dst + offset) += src
  519. ``{ATOMIC, DW, STX}`` with 'imm' = ADD means::
  520. *(u64 *)(dst + offset) += src
  521. In addition to the simple atomic operations, there also is a modifier and
  522. two complex atomic operations:
  523. .. table:: Complex atomic operations
  524. =========== ================ ===========================
  525. imm value description
  526. =========== ================ ===========================
  527. FETCH 0x01 modifier: return old value
  528. XCHG 0xe0 | FETCH atomic exchange
  529. CMPXCHG 0xf0 | FETCH atomic compare and exchange
  530. =========== ================ ===========================
  531. The ``FETCH`` modifier is optional for simple atomic operations, and
  532. always set for the complex atomic operations. If the ``FETCH`` flag
  533. is set, then the operation also overwrites ``src`` with the value that
  534. was in memory before it was modified.
  535. The ``XCHG`` operation atomically exchanges ``src`` with the value
  536. addressed by ``dst + offset``.
  537. The ``CMPXCHG`` operation atomically compares the value addressed by
  538. ``dst + offset`` with ``R0``. If they match, the value addressed by
  539. ``dst + offset`` is replaced with ``src``. In either case, the
  540. value that was at ``dst + offset`` before the operation is zero-extended
  541. and loaded back to ``R0``.
  542. 64-bit immediate instructions
  543. -----------------------------
  544. Instructions with the ``IMM`` 'mode' modifier use the wide instruction
  545. encoding defined in `Instruction encoding`_, and use the 'src_reg' field of the
  546. basic instruction to hold an opcode subtype.
  547. The following table defines a set of ``{IMM, DW, LD}`` instructions
  548. with opcode subtypes in the 'src_reg' field, using new terms such as "map"
  549. defined further below:
  550. .. table:: 64-bit immediate instructions
  551. ======= ========================================= =========== ==============
  552. src_reg pseudocode imm type dst type
  553. ======= ========================================= =========== ==============
  554. 0x0 dst = (next_imm << 32) | imm integer integer
  555. 0x1 dst = map_by_fd(imm) map fd map
  556. 0x2 dst = map_val(map_by_fd(imm)) + next_imm map fd data address
  557. 0x3 dst = var_addr(imm) variable id data address
  558. 0x4 dst = code_addr(imm) integer code address
  559. 0x5 dst = map_by_idx(imm) map index map
  560. 0x6 dst = map_val(map_by_idx(imm)) + next_imm map index data address
  561. ======= ========================================= =========== ==============
  562. where
  563. * map_by_fd(imm) means to convert a 32-bit file descriptor into an address of a map (see `Maps`_)
  564. * map_by_idx(imm) means to convert a 32-bit index into an address of a map
  565. * map_val(map) gets the address of the first value in a given map
  566. * var_addr(imm) gets the address of a platform variable (see `Platform Variables`_) with a given id
  567. * code_addr(imm) gets the address of the instruction at a specified relative offset in number of (64-bit) instructions
  568. * the 'imm type' can be used by disassemblers for display
  569. * the 'dst type' can be used for verification and JIT compilation purposes
  570. Maps
  571. ~~~~
  572. Maps are shared memory regions accessible by BPF programs on some platforms.
  573. A map can have various semantics as defined in a separate document, and may or
  574. may not have a single contiguous memory region, but the 'map_val(map)' is
  575. currently only defined for maps that do have a single contiguous memory region.
  576. Each map can have a file descriptor (fd) if supported by the platform, where
  577. 'map_by_fd(imm)' means to get the map with the specified file descriptor. Each
  578. BPF program can also be defined to use a set of maps associated with the
  579. program at load time, and 'map_by_idx(imm)' means to get the map with the given
  580. index in the set associated with the BPF program containing the instruction.
  581. Platform Variables
  582. ~~~~~~~~~~~~~~~~~~
  583. Platform variables are memory regions, identified by integer ids, exposed by
  584. the runtime and accessible by BPF programs on some platforms. The
  585. 'var_addr(imm)' operation means to get the address of the memory region
  586. identified by the given id.
  587. Legacy BPF Packet access instructions
  588. -------------------------------------
  589. BPF previously introduced special instructions for access to packet data that were
  590. carried over from classic BPF. These instructions used an instruction
  591. class of ``LD``, a size modifier of ``W``, ``H``, or ``B``, and a
  592. mode modifier of ``ABS`` or ``IND``. The 'dst_reg' and 'offset' fields were
  593. set to zero, and 'src_reg' was set to zero for ``ABS``. However, these
  594. instructions are deprecated and SHOULD no longer be used. All legacy packet
  595. access instructions belong to the "packet" conformance group.