mptcp.rst 6.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =====================
  3. Multipath TCP (MPTCP)
  4. =====================
  5. Introduction
  6. ============
  7. Multipath TCP or MPTCP is an extension to the standard TCP and is described in
  8. `RFC 8684 (MPTCPv1) <https://www.rfc-editor.org/rfc/rfc8684.html>`_. It allows a
  9. device to make use of multiple interfaces at once to send and receive TCP
  10. packets over a single MPTCP connection. MPTCP can aggregate the bandwidth of
  11. multiple interfaces or prefer the one with the lowest latency. It also allows a
  12. fail-over if one path is down, and the traffic is seamlessly reinjected on other
  13. paths.
  14. For more details about Multipath TCP in the Linux kernel, please see the
  15. official website: `mptcp.dev <https://www.mptcp.dev>`_.
  16. Use cases
  17. =========
  18. Thanks to MPTCP, being able to use multiple paths in parallel or simultaneously
  19. brings new use-cases, compared to TCP:
  20. - Seamless handovers: switching from one path to another while preserving
  21. established connections, e.g. to be used in mobility use-cases, like on
  22. smartphones.
  23. - Best network selection: using the "best" available path depending on some
  24. conditions, e.g. latency, losses, cost, bandwidth, etc.
  25. - Network aggregation: using multiple paths at the same time to have a higher
  26. throughput, e.g. to combine fixed and mobile networks to send files faster.
  27. Concepts
  28. ========
  29. Technically, when a new socket is created with the ``IPPROTO_MPTCP`` protocol
  30. (Linux-specific), a *subflow* (or *path*) is created. This *subflow* consists of
  31. a regular TCP connection that is used to transmit data through one interface.
  32. Additional *subflows* can be negotiated later between the hosts. For the remote
  33. host to be able to detect the use of MPTCP, a new field is added to the TCP
  34. *option* field of the underlying TCP *subflow*. This field contains, amongst
  35. other things, a ``MP_CAPABLE`` option that tells the other host to use MPTCP if
  36. it is supported. If the remote host or any middlebox in between does not support
  37. it, the returned ``SYN+ACK`` packet will not contain MPTCP options in the TCP
  38. *option* field. In that case, the connection will be "downgraded" to plain TCP,
  39. and it will continue with a single path.
  40. This behavior is made possible by two internal components: the path manager, and
  41. the packet scheduler.
  42. Path Manager
  43. ------------
  44. The Path Manager is in charge of *subflows*, from creation to deletion, and also
  45. address announcements. Typically, it is the client side that initiates subflows,
  46. and the server side that announces additional addresses via the ``ADD_ADDR`` and
  47. ``REMOVE_ADDR`` options.
  48. Path managers are controlled by the ``net.mptcp.path_manager`` sysctl knob --
  49. see mptcp-sysctl.rst. There are two types: the in-kernel one (``kernel``) where
  50. the same rules are applied for all the connections (see: ``ip mptcp``) ; and the
  51. userspace one (``userspace``), controlled by a userspace daemon (i.e. `mptcpd
  52. <https://mptcpd.mptcp.dev/>`_) where different rules can be applied for each
  53. connection. The path managers can be controlled via a Netlink API; see
  54. ../netlink/specs/mptcp_pm.rst.
  55. To be able to use multiple IP addresses on a host to create multiple *subflows*
  56. (paths), the default in-kernel MPTCP path-manager needs to know which IP
  57. addresses can be used. This can be configured with ``ip mptcp endpoint`` for
  58. example.
  59. Packet Scheduler
  60. ----------------
  61. The Packet Scheduler is in charge of selecting which available *subflow(s)* to
  62. use to send the next data packet. It can decide to maximize the use of the
  63. available bandwidth, only to pick the path with the lower latency, or any other
  64. policy depending on the configuration.
  65. Packet schedulers are controlled by the ``net.mptcp.scheduler`` sysctl knob --
  66. see mptcp-sysctl.rst.
  67. Sockets API
  68. ===========
  69. Creating MPTCP sockets
  70. ----------------------
  71. On Linux, MPTCP can be used by selecting MPTCP instead of TCP when creating the
  72. ``socket``:
  73. .. code-block:: C
  74. int sd = socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
  75. Note that ``IPPROTO_MPTCP`` is defined as ``262``.
  76. If MPTCP is not supported, ``errno`` will be set to:
  77. - ``EINVAL``: (*Invalid argument*): MPTCP is not available, on kernels < 5.6.
  78. - ``EPROTONOSUPPORT`` (*Protocol not supported*): MPTCP has not been compiled,
  79. on kernels >= v5.6.
  80. - ``ENOPROTOOPT`` (*Protocol not available*): MPTCP has been disabled using
  81. ``net.mptcp.enabled`` sysctl knob; see mptcp-sysctl.rst.
  82. MPTCP is then opt-in: applications need to explicitly request it. Note that
  83. applications can be forced to use MPTCP with different techniques, e.g.
  84. ``LD_PRELOAD`` (see ``mptcpize``), eBPF (see ``mptcpify``), SystemTAP,
  85. ``GODEBUG`` (``GODEBUG=multipathtcp=1``), etc.
  86. Switching to ``IPPROTO_MPTCP`` instead of ``IPPROTO_TCP`` should be as
  87. transparent as possible for the userspace applications.
  88. Socket options
  89. --------------
  90. MPTCP supports most socket options handled by TCP. It is possible some less
  91. common options are not supported, but contributions are welcome.
  92. Generally, the same value is propagated to all subflows, including the ones
  93. created after the calls to ``setsockopt()``. eBPF can be used to set different
  94. values per subflow.
  95. There are some MPTCP specific socket options at the ``SOL_MPTCP`` (284) level to
  96. retrieve info. They fill the ``optval`` buffer of the ``getsockopt()`` system
  97. call:
  98. - ``MPTCP_INFO``: Uses ``struct mptcp_info``.
  99. - ``MPTCP_TCPINFO``: Uses ``struct mptcp_subflow_data``, followed by an array of
  100. ``struct tcp_info``.
  101. - ``MPTCP_SUBFLOW_ADDRS``: Uses ``struct mptcp_subflow_data``, followed by an
  102. array of ``mptcp_subflow_addrs``.
  103. - ``MPTCP_FULL_INFO``: Uses ``struct mptcp_full_info``, with one pointer to an
  104. array of ``struct mptcp_subflow_info`` (including the
  105. ``struct mptcp_subflow_addrs``), and one pointer to an array of
  106. ``struct tcp_info``, followed by the content of ``struct mptcp_info``.
  107. Note that at the TCP level, ``TCP_IS_MPTCP`` socket option can be used to know
  108. if MPTCP is currently being used: the value will be set to 1 if it is.
  109. Design choices
  110. ==============
  111. A new socket type has been added for MPTCP for the userspace-facing socket. The
  112. kernel is in charge of creating subflow sockets: they are TCP sockets where the
  113. behavior is modified using TCP-ULP.
  114. MPTCP listen sockets will create "plain" *accepted* TCP sockets if the
  115. connection request from the client didn't ask for MPTCP, making the performance
  116. impact minimal when MPTCP is enabled by default.