vrf.rst 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ====================================
  3. Virtual Routing and Forwarding (VRF)
  4. ====================================
  5. The VRF Device
  6. ==============
  7. The VRF device combined with ip rules provides the ability to create virtual
  8. routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the
  9. Linux network stack. One use case is the multi-tenancy problem where each
  10. tenant has their own unique routing tables and in the very least need
  11. different default gateways.
  12. Processes can be "VRF aware" by binding a socket to the VRF device. Packets
  13. through the socket then use the routing table associated with the VRF
  14. device. An important feature of the VRF device implementation is that it
  15. impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected
  16. (ie., they do not need to be run in each VRF). The design also allows
  17. the use of higher priority ip rules (Policy Based Routing, PBR) to take
  18. precedence over the VRF device rules directing specific traffic as desired.
  19. In addition, VRF devices allow VRFs to be nested within namespaces. For
  20. example network namespaces provide separation of network interfaces at the
  21. device layer, VLANs on the interfaces within a namespace provide L2 separation
  22. and then VRF devices provide L3 separation.
  23. Design
  24. ------
  25. A VRF device is created with an associated route table. Network interfaces
  26. are then enslaved to a VRF device::
  27. +-----------------------------+
  28. | vrf-blue | ===> route table 10
  29. +-----------------------------+
  30. | | |
  31. +------+ +------+ +-------------+
  32. | eth1 | | eth2 | ... | bond1 |
  33. +------+ +------+ +-------------+
  34. | |
  35. +------+ +------+
  36. | eth8 | | eth9 |
  37. +------+ +------+
  38. Packets received on an enslaved device and are switched to the VRF device
  39. in the IPv4 and IPv6 processing stacks giving the impression that packets
  40. flow through the VRF device. Similarly on egress routing rules are used to
  41. send packets to the VRF device driver before getting sent out the actual
  42. interface. This allows tcpdump on a VRF device to capture all packets into
  43. and out of the VRF as a whole\ [1]_. Similarly, netfilter\ [2]_ and tc rules
  44. can be applied using the VRF device to specify rules that apply to the VRF
  45. domain as a whole.
  46. .. [1] Packets in the forwarded state do not flow through the device, so those
  47. packets are not seen by tcpdump. Will revisit this limitation in a
  48. future release.
  49. .. [2] Iptables on ingress supports PREROUTING with skb->dev set to the real
  50. ingress device and both INPUT and PREROUTING rules with skb->dev set to
  51. the VRF device. For egress POSTROUTING and OUTPUT rules can be written
  52. using either the VRF device or real egress device.
  53. Setup
  54. -----
  55. 1. VRF device is created with an association to a FIB table.
  56. e.g,::
  57. ip link add vrf-blue type vrf table 10
  58. ip link set dev vrf-blue up
  59. 2. An l3mdev FIB rule directs lookups to the table associated with the device.
  60. A single l3mdev rule is sufficient for all VRFs. The VRF device adds the
  61. l3mdev rule for IPv4 and IPv6 when the first device is created with a
  62. default preference of 1000. Users may delete the rule if desired and add
  63. with a different priority or install per-VRF rules.
  64. Prior to the v4.8 kernel iif and oif rules are needed for each VRF device::
  65. ip ru add oif vrf-blue table 10
  66. ip ru add iif vrf-blue table 10
  67. 3. Set the default route for the table (and hence default route for the VRF)::
  68. ip route add table 10 unreachable default metric 4278198272
  69. This high metric value ensures that the default unreachable route can
  70. be overridden by a routing protocol suite. FRRouting interprets
  71. kernel metrics as a combined admin distance (upper byte) and priority
  72. (lower 3 bytes). Thus the above metric translates to [255/8192].
  73. 4. Enslave L3 interfaces to a VRF device::
  74. ip link set dev eth1 master vrf-blue
  75. Local and connected routes for enslaved devices are automatically moved to
  76. the table associated with VRF device. Any additional routes depending on
  77. the enslaved device are dropped and will need to be reinserted to the VRF
  78. FIB table following the enslavement.
  79. The IPv6 sysctl option keep_addr_on_down can be enabled to keep IPv6 global
  80. addresses as VRF enslavement changes::
  81. sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
  82. 5. Additional VRF routes are added to associated table::
  83. ip route add table 10 ...
  84. Applications
  85. ------------
  86. Applications that are to work within a VRF need to bind their socket to the
  87. VRF device::
  88. setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1);
  89. or to specify the output device using cmsg and IP_PKTINFO.
  90. By default the scope of the port bindings for unbound sockets is
  91. limited to the default VRF. That is, it will not be matched by packets
  92. arriving on interfaces enslaved to an l3mdev and processes may bind to
  93. the same port if they bind to an l3mdev.
  94. TCP & UDP services running in the default VRF context (ie., not bound
  95. to any VRF device) can work across all VRF domains by enabling the
  96. tcp_l3mdev_accept and udp_l3mdev_accept sysctl options::
  97. sysctl -w net.ipv4.tcp_l3mdev_accept=1
  98. sysctl -w net.ipv4.udp_l3mdev_accept=1
  99. These options are disabled by default so that a socket in a VRF is only
  100. selected for packets in that VRF. There is a similar option for RAW
  101. sockets, which is enabled by default for reasons of backwards compatibility.
  102. This is so as to specify the output device with cmsg and IP_PKTINFO, but
  103. using a socket not bound to the corresponding VRF. This allows e.g. older ping
  104. implementations to be run with specifying the device but without executing it
  105. in the VRF. This option can be disabled so that packets received in a VRF
  106. context are only handled by a raw socket bound to the VRF, and packets in the
  107. default VRF are only handled by a socket not bound to any VRF::
  108. sysctl -w net.ipv4.raw_l3mdev_accept=0
  109. netfilter rules on the VRF device can be used to limit access to services
  110. running in the default VRF context as well.
  111. Using VRF-aware applications (applications which simultaneously create sockets
  112. outside and inside VRFs) in conjunction with ``net.ipv4.tcp_l3mdev_accept=1``
  113. is possible but may lead to problems in some situations. With that sysctl
  114. value, it is unspecified which listening socket will be selected to handle
  115. connections for VRF traffic; ie. either a socket bound to the VRF or an unbound
  116. socket may be used to accept new connections from a VRF. This somewhat
  117. unexpected behavior can lead to problems if sockets are configured with extra
  118. options (ex. TCP MD5 keys) with the expectation that VRF traffic will
  119. exclusively be handled by sockets bound to VRFs, as would be the case with
  120. ``net.ipv4.tcp_l3mdev_accept=0``. Finally and as a reminder, regardless of
  121. which listening socket is selected, established sockets will be created in the
  122. VRF based on the ingress interface, as documented earlier.
  123. --------------------------------------------------------------------------------
  124. Using iproute2 for VRFs
  125. =======================
  126. iproute2 supports the vrf keyword as of v4.7. For backwards compatibility this
  127. section lists both commands where appropriate -- with the vrf keyword and the
  128. older form without it.
  129. 1. Create a VRF
  130. To instantiate a VRF device and associate it with a table::
  131. $ ip link add dev NAME type vrf table ID
  132. As of v4.8 the kernel supports the l3mdev FIB rule where a single rule
  133. covers all VRFs. The l3mdev rule is created for IPv4 and IPv6 on first
  134. device create.
  135. 2. List VRFs
  136. To list VRFs that have been created::
  137. $ ip [-d] link show type vrf
  138. NOTE: The -d option is needed to show the table id
  139. For example::
  140. $ ip -d link show type vrf
  141. 11: mgmt: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
  142. link/ether 72:b3:ba:91:e2:24 brd ff:ff:ff:ff:ff:ff promiscuity 0
  143. vrf table 1 addrgenmode eui64
  144. 12: red: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
  145. link/ether b6:6f:6e:f6:da:73 brd ff:ff:ff:ff:ff:ff promiscuity 0
  146. vrf table 10 addrgenmode eui64
  147. 13: blue: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
  148. link/ether 36:62:e8:7d:bb:8c brd ff:ff:ff:ff:ff:ff promiscuity 0
  149. vrf table 66 addrgenmode eui64
  150. 14: green: <NOARP,MASTER,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
  151. link/ether e6:28:b8:63:70:bb brd ff:ff:ff:ff:ff:ff promiscuity 0
  152. vrf table 81 addrgenmode eui64
  153. Or in brief output::
  154. $ ip -br link show type vrf
  155. mgmt UP 72:b3:ba:91:e2:24 <NOARP,MASTER,UP,LOWER_UP>
  156. red UP b6:6f:6e:f6:da:73 <NOARP,MASTER,UP,LOWER_UP>
  157. blue UP 36:62:e8:7d:bb:8c <NOARP,MASTER,UP,LOWER_UP>
  158. green UP e6:28:b8:63:70:bb <NOARP,MASTER,UP,LOWER_UP>
  159. 3. Assign a Network Interface to a VRF
  160. Network interfaces are assigned to a VRF by enslaving the netdevice to a
  161. VRF device::
  162. $ ip link set dev NAME master NAME
  163. On enslavement connected and local routes are automatically moved to the
  164. table associated with the VRF device.
  165. For example::
  166. $ ip link set dev eth0 master mgmt
  167. 4. Show Devices Assigned to a VRF
  168. To show devices that have been assigned to a specific VRF add the master
  169. option to the ip command::
  170. $ ip link show vrf NAME
  171. $ ip link show master NAME
  172. For example::
  173. $ ip link show vrf red
  174. 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
  175. link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
  176. 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP mode DEFAULT group default qlen 1000
  177. link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
  178. 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN mode DEFAULT group default qlen 1000
  179. link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
  180. Or using the brief output::
  181. $ ip -br link show vrf red
  182. eth1 UP 02:00:00:00:02:02 <BROADCAST,MULTICAST,UP,LOWER_UP>
  183. eth2 UP 02:00:00:00:02:03 <BROADCAST,MULTICAST,UP,LOWER_UP>
  184. eth5 DOWN 02:00:00:00:02:06 <BROADCAST,MULTICAST>
  185. 5. Show Neighbor Entries for a VRF
  186. To list neighbor entries associated with devices enslaved to a VRF device
  187. add the master option to the ip command::
  188. $ ip [-6] neigh show vrf NAME
  189. $ ip [-6] neigh show master NAME
  190. For example::
  191. $ ip neigh show vrf red
  192. 10.2.1.254 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
  193. 10.2.2.254 dev eth2 lladdr 5e:54:01:6a:ee:80 REACHABLE
  194. $ ip -6 neigh show vrf red
  195. 2002:1::64 dev eth1 lladdr a6:d9:c7:4f:06:23 REACHABLE
  196. 6. Show Addresses for a VRF
  197. To show addresses for interfaces associated with a VRF add the master
  198. option to the ip command::
  199. $ ip addr show vrf NAME
  200. $ ip addr show master NAME
  201. For example::
  202. $ ip addr show vrf red
  203. 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
  204. link/ether 02:00:00:00:02:02 brd ff:ff:ff:ff:ff:ff
  205. inet 10.2.1.2/24 brd 10.2.1.255 scope global eth1
  206. valid_lft forever preferred_lft forever
  207. inet6 2002:1::2/120 scope global
  208. valid_lft forever preferred_lft forever
  209. inet6 fe80::ff:fe00:202/64 scope link
  210. valid_lft forever preferred_lft forever
  211. 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
  212. link/ether 02:00:00:00:02:03 brd ff:ff:ff:ff:ff:ff
  213. inet 10.2.2.2/24 brd 10.2.2.255 scope global eth2
  214. valid_lft forever preferred_lft forever
  215. inet6 2002:2::2/120 scope global
  216. valid_lft forever preferred_lft forever
  217. inet6 fe80::ff:fe00:203/64 scope link
  218. valid_lft forever preferred_lft forever
  219. 7: eth5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master red state DOWN group default qlen 1000
  220. link/ether 02:00:00:00:02:06 brd ff:ff:ff:ff:ff:ff
  221. Or in brief format::
  222. $ ip -br addr show vrf red
  223. eth1 UP 10.2.1.2/24 2002:1::2/120 fe80::ff:fe00:202/64
  224. eth2 UP 10.2.2.2/24 2002:2::2/120 fe80::ff:fe00:203/64
  225. eth5 DOWN
  226. 7. Show Routes for a VRF
  227. To show routes for a VRF use the ip command to display the table associated
  228. with the VRF device::
  229. $ ip [-6] route show vrf NAME
  230. $ ip [-6] route show table ID
  231. For example::
  232. $ ip route show vrf red
  233. unreachable default metric 4278198272
  234. broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2
  235. 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2
  236. local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2
  237. broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2
  238. broadcast 10.2.2.0 dev eth2 proto kernel scope link src 10.2.2.2
  239. 10.2.2.0/24 dev eth2 proto kernel scope link src 10.2.2.2
  240. local 10.2.2.2 dev eth2 proto kernel scope host src 10.2.2.2
  241. broadcast 10.2.2.255 dev eth2 proto kernel scope link src 10.2.2.2
  242. $ ip -6 route show vrf red
  243. local 2002:1:: dev lo proto none metric 0 pref medium
  244. local 2002:1::2 dev lo proto none metric 0 pref medium
  245. 2002:1::/120 dev eth1 proto kernel metric 256 pref medium
  246. local 2002:2:: dev lo proto none metric 0 pref medium
  247. local 2002:2::2 dev lo proto none metric 0 pref medium
  248. 2002:2::/120 dev eth2 proto kernel metric 256 pref medium
  249. local fe80:: dev lo proto none metric 0 pref medium
  250. local fe80:: dev lo proto none metric 0 pref medium
  251. local fe80::ff:fe00:202 dev lo proto none metric 0 pref medium
  252. local fe80::ff:fe00:203 dev lo proto none metric 0 pref medium
  253. fe80::/64 dev eth1 proto kernel metric 256 pref medium
  254. fe80::/64 dev eth2 proto kernel metric 256 pref medium
  255. ff00::/8 dev red metric 256 pref medium
  256. ff00::/8 dev eth1 metric 256 pref medium
  257. ff00::/8 dev eth2 metric 256 pref medium
  258. unreachable default dev lo metric 4278198272 error -101 pref medium
  259. 8. Route Lookup for a VRF
  260. A test route lookup can be done for a VRF::
  261. $ ip [-6] route get vrf NAME ADDRESS
  262. $ ip [-6] route get oif NAME ADDRESS
  263. For example::
  264. $ ip route get 10.2.1.40 vrf red
  265. 10.2.1.40 dev eth1 table red src 10.2.1.2
  266. cache
  267. $ ip -6 route get 2002:1::32 vrf red
  268. 2002:1::32 from :: dev eth1 table red proto kernel src 2002:1::2 metric 256 pref medium
  269. 9. Removing Network Interface from a VRF
  270. Network interfaces are removed from a VRF by breaking the enslavement to
  271. the VRF device::
  272. $ ip link set dev NAME nomaster
  273. Connected routes are moved back to the default table and local entries are
  274. moved to the local table.
  275. For example::
  276. $ ip link set dev eth0 nomaster
  277. --------------------------------------------------------------------------------
  278. Commands used in this example::
  279. cat >> /etc/iproute2/rt_tables.d/vrf.conf <<EOF
  280. 1 mgmt
  281. 10 red
  282. 66 blue
  283. 81 green
  284. EOF
  285. function vrf_create
  286. {
  287. VRF=$1
  288. TBID=$2
  289. # create VRF device
  290. ip link add ${VRF} type vrf table ${TBID}
  291. if [ "${VRF}" != "mgmt" ]; then
  292. ip route add table ${TBID} unreachable default metric 4278198272
  293. fi
  294. ip link set dev ${VRF} up
  295. }
  296. vrf_create mgmt 1
  297. ip link set dev eth0 master mgmt
  298. vrf_create red 10
  299. ip link set dev eth1 master red
  300. ip link set dev eth2 master red
  301. ip link set dev eth5 master red
  302. vrf_create blue 66
  303. ip link set dev eth3 master blue
  304. vrf_create green 81
  305. ip link set dev eth4 master green
  306. Interface addresses from /etc/network/interfaces:
  307. auto eth0
  308. iface eth0 inet static
  309. address 10.0.0.2
  310. netmask 255.255.255.0
  311. gateway 10.0.0.254
  312. iface eth0 inet6 static
  313. address 2000:1::2
  314. netmask 120
  315. auto eth1
  316. iface eth1 inet static
  317. address 10.2.1.2
  318. netmask 255.255.255.0
  319. iface eth1 inet6 static
  320. address 2002:1::2
  321. netmask 120
  322. auto eth2
  323. iface eth2 inet static
  324. address 10.2.2.2
  325. netmask 255.255.255.0
  326. iface eth2 inet6 static
  327. address 2002:2::2
  328. netmask 120
  329. auto eth3
  330. iface eth3 inet static
  331. address 10.2.3.2
  332. netmask 255.255.255.0
  333. iface eth3 inet6 static
  334. address 2002:3::2
  335. netmask 120
  336. auto eth4
  337. iface eth4 inet static
  338. address 10.2.4.2
  339. netmask 255.255.255.0
  340. iface eth4 inet6 static
  341. address 2002:4::2
  342. netmask 120