README 7.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. xdrgen - Linux Kernel XDR code generator
  2. Introduction
  3. ------------
  4. SunRPC programs are typically specified using a language defined by
  5. RFC 4506. In fact, all IETF-published NFS specifications provide a
  6. description of the specified protocol using this language.
  7. Since the 1990's, user space consumers of SunRPC have had access to
  8. a tool that could read such XDR specifications and then generate C
  9. code that implements the RPC portions of that protocol. This tool is
  10. called rpcgen.
  11. This RPC-level code is code that handles input directly from the
  12. network, and thus a high degree of memory safety and sanity checking
  13. is needed to help ensure proper levels of security. Bugs in this
  14. code can have significant impact on security and performance.
  15. However, it is code that is repetitive and tedious to write by hand.
  16. The C code generated by rpcgen makes extensive use of the facilities
  17. of the user space TI-RPC library and libc. Furthermore, the dialect
  18. of the generated code is very traditional K&R C.
  19. The Linux kernel's implementation of SunRPC-based protocols hand-roll
  20. their XDR implementation. There are two main reasons for this:
  21. 1. libtirpc (and its predecessors) operate only in user space. The
  22. kernel's RPC implementation and its API are significantly
  23. different than libtirpc.
  24. 2. rpcgen-generated code is believed to be less efficient than code
  25. that is hand-written.
  26. These days, gcc and its kin are capable of optimizing code better
  27. than human authors. There are only a few instances where writing
  28. XDR code by hand will make a measurable performance different.
  29. In addition, the current hand-written code in the Linux kernel is
  30. difficult to audit and prove that it implements exactly what is in
  31. the protocol specification.
  32. In order to accrue the benefits of machine-generated XDR code in the
  33. kernel, a tool is needed that will output C code that works against
  34. the kernel's SunRPC implementation rather than libtirpc.
  35. Enter xdrgen.
  36. Dependencies
  37. ------------
  38. These dependencies are typically packaged by Linux distributions:
  39. - python3
  40. - python3-lark
  41. - python3-jinja2
  42. These dependencies are available via PyPi:
  43. - pip install 'lark[interegular]'
  44. XDR Specifications
  45. ------------------
  46. When adding a new protocol implementation to the kernel, the XDR
  47. specification can be derived by feeding a .txt copy of the RFC to
  48. the script located in tools/net/sunrpc/extract.sh.
  49. $ extract.sh < rfc0001.txt > new2.x
  50. Operation
  51. ---------
  52. Once a .x file is available, use xdrgen to generate source and
  53. header files containing an implementation of XDR encoding and
  54. decoding functions for the specified protocol.
  55. $ ./xdrgen definitions new2.x > include/linux/sunrpc/xdrgen/new2.h
  56. $ ./xdrgen declarations new2.x > new2xdr_gen.h
  57. and
  58. $ ./xdrgen source new2.x > new2xdr_gen.c
  59. The files are ready to use for a server-side protocol implementation,
  60. or may be used as a guide for implementing these routines by hand.
  61. By default, the only comments added to this code are kdoc comments
  62. that appear directly in front of the public per-procedure APIs. For
  63. deeper introspection, specifying the "--annotate" flag will insert
  64. additional comments in the generated code to help readers match the
  65. generated code to specific parts of the XDR specification.
  66. Because the generated code is targeted for the Linux kernel, it
  67. is tagged with a GPLv2-only license.
  68. The xdrgen tool can also provide lexical and syntax checking of
  69. an XDR specification:
  70. $ ./xdrgen lint xdr/new.x
  71. How It Works
  72. ------------
  73. xdrgen does not use machine learning to generate source code. The
  74. translation is entirely deterministic.
  75. RFC 4506 Section 6 contains a BNF grammar of the XDR specification
  76. language. The grammar has been adapted for use by the Python Lark
  77. module.
  78. The xdr.ebnf file in this directory contains the grammar used to
  79. parse XDR specifications. xdrgen configures Lark using the grammar
  80. in xdr.ebnf. Lark parses the target XDR specification using this
  81. grammar, creating a parse tree.
  82. xdrgen then transforms the parse tree into an abstract syntax tree.
  83. This tree is passed to a series of code generators.
  84. The generators are implemented as Python classes residing in the
  85. generators/ directory. Each generator emits code created from Jinja2
  86. templates stored in the templates/ directory.
  87. The source code is generated in the same order in which they appear
  88. in the specification to ensure the generated code compiles. This
  89. conforms with the behavior of rpcgen.
  90. xdrgen assumes that the generated source code is further compiled by
  91. a compiler that can optimize in a number of ways, including:
  92. - Unused functions are discarded (ie, not added to the executable)
  93. - Aggressive function inlining removes unnecessary stack frames
  94. - Single-arm switch statements are replaced by a single conditional
  95. branch
  96. And so on.
  97. Pragmas
  98. -------
  99. Pragma directives specify exceptions to the normal generation of
  100. encoding and decoding functions. Currently one directive is
  101. implemented: "public".
  102. Pragma big_endian
  103. ------ ----------
  104. pragma big_endian <enum> ;
  105. For variables that might contain only a small number values, it
  106. is more efficient to avoid the byte-swap when encoding or decoding
  107. on little-endian machines. Such is often the case with error status
  108. codes. For example:
  109. pragma big_endian nfsstat3;
  110. In this case, when generating an XDR struct or union containing a
  111. field of type "nfsstat3", xdrgen will make the type of that field
  112. "__be32" instead of "enum nfsstat3". XDR unions then switch on the
  113. non-byte-swapped value of that field.
  114. Pragma exclude
  115. ------ -------
  116. pragma exclude <RPC procedure> ;
  117. In some cases, a procedure encoder or decoder function might need
  118. special processing that cannot be automatically generated. The
  119. automatically-generated functions might conflict or interfere with
  120. the hand-rolled function. To avoid editing the generated source code
  121. by hand, a pragma can specify that the procedure's encoder and
  122. decoder functions are not included in the generated header and
  123. source.
  124. For example:
  125. pragma exclude NFSPROC3_READDIRPLUS;
  126. Excludes the decoder function for the READDIRPLUS argument and the
  127. encoder function for the READDIRPLUS result.
  128. Note that because data item encoder and decoder functions are
  129. defined "static __maybe_unused", subsequent compilation
  130. automatically excludes data item encoder and decoder functions that
  131. are used only by excluded procedure.
  132. Pragma header
  133. ------ ------
  134. pragma header <string> ;
  135. Provide a name to use for the header file. For example:
  136. pragma header nlm4;
  137. Adds
  138. #include "nlm4xdr_gen.h"
  139. to the generated source file.
  140. Pragma public
  141. ------ ------
  142. pragma public <XDR data item> ;
  143. Normally XDR encoder and decoder functions are "static". In case an
  144. implementer wants to call these functions from other source code,
  145. s/he can add a public pragma in the input .x file to indicate a set
  146. of functions that should get a prototype in the generated header,
  147. and the function definitions will not be declared static.
  148. For example:
  149. pragma public nfsstat3;
  150. Adds these prototypes in the generated header:
  151. bool xdrgen_decode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 *ptr);
  152. bool xdrgen_encode_nfsstat3(struct xdr_stream *xdr, enum nfsstat3 value);
  153. And, in the generated source code, both of these functions appear
  154. without the "static __maybe_unused" modifiers.
  155. Future Work
  156. -----------
  157. Finish implementing XDR pointer and list types.
  158. Generate client-side procedure functions
  159. Expand the README into a user guide similar to rpcgen(1)
  160. Add more pragma directives:
  161. * @pages -- use xdr_read/write_pages() for the specified opaque
  162. field
  163. * @skip -- do not decode, but rather skip, the specified argument
  164. field
  165. Enable something like a #include to dynamically insert the content
  166. of other specification files
  167. Build a unit test suite for verifying translation of XDR language
  168. into compilable code
  169. Add a command-line option to insert trace_printk call sites in the
  170. generated source code, for improved (temporary) observability
  171. Generate kernel Rust code as well as C code