tls.rst 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346
  1. .. _kernel_tls:
  2. ==========
  3. Kernel TLS
  4. ==========
  5. Overview
  6. ========
  7. Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
  8. TCP. TLS provides end-to-end data integrity and confidentiality.
  9. User interface
  10. ==============
  11. Creating a TLS connection
  12. -------------------------
  13. First create a new TCP socket and once the connection is established set the
  14. TLS ULP.
  15. .. code-block:: c
  16. sock = socket(AF_INET, SOCK_STREAM, 0);
  17. connect(sock, addr, addrlen);
  18. setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
  19. Setting the TLS ULP allows us to set/get TLS socket options. Currently
  20. only the symmetric encryption is handled in the kernel. After the TLS
  21. handshake is complete, we have all the parameters required to move the
  22. data-path to the kernel. There is a separate socket option for moving
  23. the transmit and the receive into the kernel.
  24. .. code-block:: c
  25. /* From linux/tls.h */
  26. struct tls_crypto_info {
  27. unsigned short version;
  28. unsigned short cipher_type;
  29. };
  30. struct tls12_crypto_info_aes_gcm_128 {
  31. struct tls_crypto_info info;
  32. unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
  33. unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
  34. unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
  35. unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
  36. };
  37. struct tls12_crypto_info_aes_gcm_128 crypto_info;
  38. crypto_info.info.version = TLS_1_2_VERSION;
  39. crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
  40. memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
  41. memcpy(crypto_info.rec_seq, seq_number_write,
  42. TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
  43. memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
  44. memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
  45. setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
  46. Transmit and receive are set separately, but the setup is the same, using either
  47. TLS_TX or TLS_RX.
  48. Sending TLS application data
  49. ----------------------------
  50. After setting the TLS_TX socket option all application data sent over this
  51. socket is encrypted using TLS and the parameters provided in the socket option.
  52. For example, we can send an encrypted hello world record as follows:
  53. .. code-block:: c
  54. const char *msg = "hello world\n";
  55. send(sock, msg, strlen(msg));
  56. send() data is directly encrypted from the userspace buffer provided
  57. to the encrypted kernel send buffer if possible.
  58. The sendfile system call will send the file's data over TLS records of maximum
  59. length (2^14).
  60. .. code-block:: c
  61. file = open(filename, O_RDONLY);
  62. fstat(file, &stat);
  63. sendfile(sock, file, &offset, stat.st_size);
  64. TLS records are created and sent after each send() call, unless
  65. MSG_MORE is passed. MSG_MORE will delay creation of a record until
  66. MSG_MORE is not passed, or the maximum record size is reached.
  67. The kernel will need to allocate a buffer for the encrypted data.
  68. This buffer is allocated at the time send() is called, such that
  69. either the entire send() call will return -ENOMEM (or block waiting
  70. for memory), or the encryption will always succeed. If send() returns
  71. -ENOMEM and some data was left on the socket buffer from a previous
  72. call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
  73. Receiving TLS application data
  74. ------------------------------
  75. After setting the TLS_RX socket option, all recv family socket calls
  76. are decrypted using TLS parameters provided. A full TLS record must
  77. be received before decryption can happen.
  78. .. code-block:: c
  79. char buffer[16384];
  80. recv(sock, buffer, 16384);
  81. Received data is decrypted directly in to the user buffer if it is
  82. large enough, and no additional allocations occur. If the userspace
  83. buffer is too small, data is decrypted in the kernel and copied to
  84. userspace.
  85. ``EINVAL`` is returned if the TLS version in the received message does not
  86. match the version passed in setsockopt.
  87. ``EMSGSIZE`` is returned if the received message is too big.
  88. ``EBADMSG`` is returned if decryption failed for any other reason.
  89. Send TLS control messages
  90. -------------------------
  91. Other than application data, TLS has control messages such as alert
  92. messages (record type 21) and handshake messages (record type 22), etc.
  93. These messages can be sent over the socket by providing the TLS record type
  94. via a CMSG. For example the following function sends @data of @length bytes
  95. using a record of type @record_type.
  96. .. code-block:: c
  97. /* send TLS control message using record_type */
  98. static int klts_send_ctrl_message(int sock, unsigned char record_type,
  99. void *data, size_t length)
  100. {
  101. struct msghdr msg = {0};
  102. int cmsg_len = sizeof(record_type);
  103. struct cmsghdr *cmsg;
  104. char buf[CMSG_SPACE(cmsg_len)];
  105. struct iovec msg_iov; /* Vector of data to send/receive into. */
  106. msg.msg_control = buf;
  107. msg.msg_controllen = sizeof(buf);
  108. cmsg = CMSG_FIRSTHDR(&msg);
  109. cmsg->cmsg_level = SOL_TLS;
  110. cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
  111. cmsg->cmsg_len = CMSG_LEN(cmsg_len);
  112. *CMSG_DATA(cmsg) = record_type;
  113. msg.msg_controllen = cmsg->cmsg_len;
  114. msg_iov.iov_base = data;
  115. msg_iov.iov_len = length;
  116. msg.msg_iov = &msg_iov;
  117. msg.msg_iovlen = 1;
  118. return sendmsg(sock, &msg, 0);
  119. }
  120. Control message data should be provided unencrypted, and will be
  121. encrypted by the kernel.
  122. Receiving TLS control messages
  123. ------------------------------
  124. TLS control messages are passed in the userspace buffer, with message
  125. type passed via cmsg. If no cmsg buffer is provided, an error is
  126. returned if a control message is received. Data messages may be
  127. received without a cmsg buffer set.
  128. .. code-block:: c
  129. char buffer[16384];
  130. char cmsg[CMSG_SPACE(sizeof(unsigned char))];
  131. struct msghdr msg = {0};
  132. msg.msg_control = cmsg;
  133. msg.msg_controllen = sizeof(cmsg);
  134. struct iovec msg_iov;
  135. msg_iov.iov_base = buffer;
  136. msg_iov.iov_len = 16384;
  137. msg.msg_iov = &msg_iov;
  138. msg.msg_iovlen = 1;
  139. int ret = recvmsg(sock, &msg, 0 /* flags */);
  140. struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
  141. if (cmsg->cmsg_level == SOL_TLS &&
  142. cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
  143. int record_type = *((unsigned char *)CMSG_DATA(cmsg));
  144. // Do something with record_type, and control message data in
  145. // buffer.
  146. //
  147. // Note that record_type may be == to application data (23).
  148. } else {
  149. // Buffer contains application data.
  150. }
  151. recv will never return data from mixed types of TLS records.
  152. TLS 1.3 Key Updates
  153. -------------------
  154. In TLS 1.3, KeyUpdate handshake messages signal that the sender is
  155. updating its TX key. Any message sent after a KeyUpdate will be
  156. encrypted using the new key. The userspace library can pass the new
  157. key to the kernel using the TLS_TX and TLS_RX socket options, as for
  158. the initial keys. TLS version and cipher cannot be changed.
  159. To prevent attempting to decrypt incoming records using the wrong key,
  160. decryption will be paused when a KeyUpdate message is received by the
  161. kernel, until the new key has been provided using the TLS_RX socket
  162. option. Any read occurring after the KeyUpdate has been read and
  163. before the new key is provided will fail with EKEYEXPIRED. poll() will
  164. not report any read events from the socket until the new key is
  165. provided. There is no pausing on the transmit side.
  166. Userspace should make sure that the crypto_info provided has been set
  167. properly. In particular, the kernel will not check for key/nonce
  168. reuse.
  169. The number of successful and failed key updates is tracked in the
  170. ``TlsTxRekeyOk``, ``TlsRxRekeyOk``, ``TlsTxRekeyError``,
  171. ``TlsRxRekeyError`` statistics. The ``TlsRxRekeyReceived`` statistic
  172. counts KeyUpdate handshake messages that have been received.
  173. Integrating in to userspace TLS library
  174. ---------------------------------------
  175. At a high level, the kernel TLS ULP is a replacement for the record
  176. layer of a userspace TLS library.
  177. A patchset to OpenSSL to use ktls as the record layer is
  178. `here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
  179. `An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
  180. of calling send directly after a handshake using gnutls.
  181. Since it doesn't implement a full record layer, control
  182. messages are not supported.
  183. Optional optimizations
  184. ----------------------
  185. There are certain condition-specific optimizations the TLS ULP can make,
  186. if requested. Those optimizations are either not universally beneficial
  187. or may impact correctness, hence they require an opt-in.
  188. All options are set per-socket using setsockopt(), and their
  189. state can be checked using getsockopt() and via socket diag (``ss``).
  190. TLS_TX_ZEROCOPY_RO
  191. ~~~~~~~~~~~~~~~~~~
  192. For device offload only. Allow sendfile() data to be transmitted directly
  193. to the NIC without making an in-kernel copy. This allows true zero-copy
  194. behavior when device offload is enabled.
  195. The application must make sure that the data is not modified between being
  196. submitted and transmission completing. In other words this is mostly
  197. applicable if the data sent on a socket via sendfile() is read-only.
  198. Modifying the data may result in different versions of the data being used
  199. for the original TCP transmission and TCP retransmissions. To the receiver
  200. this will look like TLS records had been tampered with and will result
  201. in record authentication failures.
  202. TLS_RX_EXPECT_NO_PAD
  203. ~~~~~~~~~~~~~~~~~~~~
  204. TLS 1.3 only. Expect the sender to not pad records. This allows the data
  205. to be decrypted directly into user space buffers with TLS 1.3.
  206. This optimization is safe to enable only if the remote end is trusted,
  207. otherwise it is an attack vector to doubling the TLS processing cost.
  208. If the record decrypted turns out to had been padded or is not a data
  209. record it will be decrypted again into a kernel buffer without zero copy.
  210. Such events are counted in the ``TlsDecryptRetry`` statistic.
  211. TLS_TX_MAX_PAYLOAD_LEN
  212. ~~~~~~~~~~~~~~~~~~~~~~
  213. Specifies the maximum size of the plaintext payload for transmitted TLS records.
  214. When this option is set, the kernel enforces the specified limit on all outgoing
  215. TLS records. No plaintext fragment will exceed this size. This option can be used
  216. to implement the TLS Record Size Limit extension [1].
  217. * For TLS 1.2, the value corresponds directly to the record size limit.
  218. * For TLS 1.3, the value should be set to record_size_limit - 1, since
  219. the record size limit includes one additional byte for the ContentType
  220. field.
  221. The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to
  222. 16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for the
  223. extra byte used by the ContentType field.
  224. [1] https://datatracker.ietf.org/doc/html/rfc8449
  225. Statistics
  226. ==========
  227. TLS implementation exposes the following per-namespace statistics
  228. (``/proc/net/tls_stat``):
  229. - ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
  230. number of TX and RX sessions currently installed where host handles
  231. cryptography
  232. - ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
  233. number of TX and RX sessions currently installed where NIC handles
  234. cryptography
  235. - ``TlsTxSw``, ``TlsRxSw`` -
  236. number of TX and RX sessions opened with host cryptography
  237. - ``TlsTxDevice``, ``TlsRxDevice`` -
  238. number of TX and RX sessions opened with NIC cryptography
  239. - ``TlsDecryptError`` -
  240. record decryption failed (e.g. due to incorrect authentication tag)
  241. - ``TlsDeviceRxResync`` -
  242. number of RX resyncs sent to NICs handling cryptography
  243. - ``TlsDecryptRetry`` -
  244. number of RX records which had to be re-decrypted due to
  245. ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
  246. also increment for non-data records.
  247. - ``TlsRxNoPadViolation`` -
  248. number of data RX records which had to be re-decrypted due to
  249. ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.
  250. - ``TlsTxRekeyOk``, ``TlsRxRekeyOk`` -
  251. number of successful rekeys on existing sessions for TX and RX
  252. - ``TlsTxRekeyError``, ``TlsRxRekeyError`` -
  253. number of failed rekeys on existing sessions for TX and RX
  254. - ``TlsRxRekeyReceived`` -
  255. number of received KeyUpdate handshake messages, requiring userspace
  256. to provide a new RX key