sja1105.rst 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445
  1. =========================
  2. NXP SJA1105 switch driver
  3. =========================
  4. Overview
  5. ========
  6. The NXP SJA1105 is a family of 10 SPI-managed automotive switches:
  7. - SJA1105E: First generation, no TTEthernet
  8. - SJA1105T: First generation, TTEthernet
  9. - SJA1105P: Second generation, no TTEthernet, no SGMII
  10. - SJA1105Q: Second generation, TTEthernet, no SGMII
  11. - SJA1105R: Second generation, no TTEthernet, SGMII
  12. - SJA1105S: Second generation, TTEthernet, SGMII
  13. - SJA1110A: Third generation, TTEthernet, SGMII, integrated 100base-T1 and
  14. 100base-TX PHYs
  15. - SJA1110B: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
  16. - SJA1110C: Third generation, TTEthernet, SGMII, 100base-T1, 100base-TX
  17. - SJA1110D: Third generation, TTEthernet, SGMII, 100base-T1
  18. Being automotive parts, their configuration interface is geared towards
  19. set-and-forget use, with minimal dynamic interaction at runtime. They
  20. require a static configuration to be composed by software and packed
  21. with CRC and table headers, and sent over SPI.
  22. The static configuration is composed of several configuration tables. Each
  23. table takes a number of entries. Some configuration tables can be (partially)
  24. reconfigured at runtime, some not. Some tables are mandatory, some not:
  25. ============================= ================== =============================
  26. Table Mandatory Reconfigurable
  27. ============================= ================== =============================
  28. Schedule no no
  29. Schedule entry points if Scheduling no
  30. VL Lookup no no
  31. VL Policing if VL Lookup no
  32. VL Forwarding if VL Lookup no
  33. L2 Lookup no no
  34. L2 Policing yes no
  35. VLAN Lookup yes yes
  36. L2 Forwarding yes partially (fully on P/Q/R/S)
  37. MAC Config yes partially (fully on P/Q/R/S)
  38. Schedule Params if Scheduling no
  39. Schedule Entry Points Params if Scheduling no
  40. VL Forwarding Params if VL Forwarding no
  41. L2 Lookup Params no partially (fully on P/Q/R/S)
  42. L2 Forwarding Params yes no
  43. Clock Sync Params no no
  44. AVB Params no no
  45. General Params yes partially
  46. Retagging no yes
  47. xMII Params yes no
  48. SGMII no yes
  49. ============================= ================== =============================
  50. Also the configuration is write-only (software cannot read it back from the
  51. switch except for very few exceptions).
  52. The driver creates a static configuration at probe time, and keeps it at
  53. all times in memory, as a shadow for the hardware state. When required to
  54. change a hardware setting, the static configuration is also updated.
  55. If that changed setting can be transmitted to the switch through the dynamic
  56. reconfiguration interface, it is; otherwise the switch is reset and
  57. reprogrammed with the updated static configuration.
  58. Switching features
  59. ==================
  60. The driver supports the configuration of L2 forwarding rules in hardware for
  61. port bridging. The forwarding, broadcast and flooding domain between ports can
  62. be restricted through two methods: either at the L2 forwarding level (isolate
  63. one bridge's ports from another's) or at the VLAN port membership level
  64. (isolate ports within the same bridge). The final forwarding decision taken by
  65. the hardware is a logical AND of these two sets of rules.
  66. The hardware tags all traffic internally with a port-based VLAN (pvid), or it
  67. decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification
  68. is not possible. Once attributed a VLAN tag, frames are checked against the
  69. port's membership rules and dropped at ingress if they don't match any VLAN.
  70. This behavior is available when switch ports join a bridge with
  71. ``vlan_filtering 1``.
  72. Normally the hardware is not configurable with respect to VLAN awareness, but
  73. by changing what TPID the switch searches 802.1Q tags for, the semantics of a
  74. bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or
  75. untagged), and therefore this mode is also supported.
  76. Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but
  77. all bridges should have the same level of VLAN awareness (either both have
  78. ``vlan_filtering`` 0, or both 1).
  79. Topology and loop detection through STP is supported.
  80. Offloads
  81. ========
  82. Time-aware scheduling
  83. ---------------------
  84. The switch supports a variation of the enhancements for scheduled traffic
  85. specified in IEEE 802.1Q-2018 (formerly 802.1Qbv). This means it can be used to
  86. ensure deterministic latency for priority traffic that is sent in-band with its
  87. gate-open event in the network schedule.
  88. This capability can be managed through the tc-taprio offload ('flags 2'). The
  89. difference compared to the software implementation of taprio is that the latter
  90. would only be able to shape traffic originated from the CPU, but not
  91. autonomously forwarded flows.
  92. The device has 8 traffic classes, and maps incoming frames to one of them based
  93. on the VLAN PCP bits (if no VLAN is present, the port-based default is used).
  94. As described in the previous sections, depending on the value of
  95. ``vlan_filtering``, the EtherType recognized by the switch as being VLAN can
  96. either be the typical 0x8100 or a custom value used internally by the driver
  97. for tagging. Therefore, the switch ignores the VLAN PCP if used in standalone
  98. or bridge mode with ``vlan_filtering=0``, as it will not recognize the 0x8100
  99. EtherType. In these modes, injecting into a particular TX queue can only be
  100. done by the DSA net devices, which populate the PCP field of the tagging header
  101. on egress. Using ``vlan_filtering=1``, the behavior is the other way around:
  102. offloaded flows can be steered to TX queues based on the VLAN PCP, but the DSA
  103. net devices are no longer able to do that. To inject frames into a hardware TX
  104. queue with VLAN awareness active, it is necessary to create a VLAN
  105. sub-interface on the DSA conduit port, and send normal (0x8100) VLAN-tagged
  106. towards the switch, with the VLAN PCP bits set appropriately.
  107. Management traffic (having DMAC 01-80-C2-xx-xx-xx or 01-19-1B-xx-xx-xx) is the
  108. notable exception: the switch always treats it with a fixed priority and
  109. disregards any VLAN PCP bits even if present. The traffic class for management
  110. traffic has a value of 7 (highest priority) at the moment, which is not
  111. configurable in the driver.
  112. Below is an example of configuring a 500 us cyclic schedule on egress port
  113. ``swp5``. The traffic class gate for management traffic (7) is open for 100 us,
  114. and the gates for all other traffic classes are open for 400 us::
  115. #!/bin/bash
  116. set -e -u -o pipefail
  117. NSEC_PER_SEC="1000000000"
  118. gatemask() {
  119. local tc_list="$1"
  120. local mask=0
  121. for tc in ${tc_list}; do
  122. mask=$((${mask} | (1 << ${tc})))
  123. done
  124. printf "%02x" ${mask}
  125. }
  126. if ! systemctl is-active --quiet ptp4l; then
  127. echo "Please start the ptp4l service"
  128. exit
  129. fi
  130. now=$(phc_ctl /dev/ptp1 get | gawk '/clock time is/ { print $5; }')
  131. # Phase-align the base time to the start of the next second.
  132. sec=$(echo "${now}" | gawk -F. '{ print $1; }')
  133. base_time="$(((${sec} + 1) * ${NSEC_PER_SEC}))"
  134. tc qdisc add dev swp5 parent root handle 100 taprio \
  135. num_tc 8 \
  136. map 0 1 2 3 5 6 7 \
  137. queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
  138. base-time ${base_time} \
  139. sched-entry S $(gatemask 7) 100000 \
  140. sched-entry S $(gatemask "0 1 2 3 4 5 6") 400000 \
  141. flags 2
  142. It is possible to apply the tc-taprio offload on multiple egress ports. There
  143. are hardware restrictions related to the fact that no gate event may trigger
  144. simultaneously on two ports. The driver checks the consistency of the schedules
  145. against this restriction and errors out when appropriate. Schedule analysis is
  146. needed to avoid this, which is outside the scope of the document.
  147. Routing actions (redirect, trap, drop)
  148. --------------------------------------
  149. The switch is able to offload flow-based redirection of packets to a set of
  150. destination ports specified by the user. Internally, this is implemented by
  151. making use of Virtual Links, a TTEthernet concept.
  152. The driver supports 2 types of keys for Virtual Links:
  153. - VLAN-aware virtual links: these match on destination MAC address, VLAN ID and
  154. VLAN PCP.
  155. - VLAN-unaware virtual links: these match on destination MAC address only.
  156. The VLAN awareness state of the bridge (vlan_filtering) cannot be changed while
  157. there are virtual link rules installed.
  158. Composing multiple actions inside the same rule is supported. When only routing
  159. actions are requested, the driver creates a "non-critical" virtual link. When
  160. the action list also contains tc-gate (more details below), the virtual link
  161. becomes "time-critical" (draws frame buffers from a reserved memory partition,
  162. etc).
  163. The 3 routing actions that are supported are "trap", "drop" and "redirect".
  164. Example 1: send frames received on swp2 with a DA of 42:be:24:9b:76:20 to the
  165. CPU and to swp3. This type of key (DA only) when the port's VLAN awareness
  166. state is off::
  167. tc qdisc add dev swp2 clsact
  168. tc filter add dev swp2 ingress flower skip_sw dst_mac 42:be:24:9b:76:20 \
  169. action mirred egress redirect dev swp3 \
  170. action trap
  171. Example 2: drop frames received on swp2 with a DA of 42:be:24:9b:76:20, a VID
  172. of 100 and a PCP of 0::
  173. tc filter add dev swp2 ingress protocol 802.1Q flower skip_sw \
  174. dst_mac 42:be:24:9b:76:20 vlan_id 100 vlan_prio 0 action drop
  175. Time-based ingress policing
  176. ---------------------------
  177. The TTEthernet hardware abilities of the switch can be constrained to act
  178. similarly to the Per-Stream Filtering and Policing (PSFP) clause specified in
  179. IEEE 802.1Q-2018 (formerly 802.1Qci). This means it can be used to perform
  180. tight timing-based admission control for up to 1024 flows (identified by a
  181. tuple composed of destination MAC address, VLAN ID and VLAN PCP). Packets which
  182. are received outside their expected reception window are dropped.
  183. This capability can be managed through the offload of the tc-gate action. As
  184. routing actions are intrinsic to virtual links in TTEthernet (which performs
  185. explicit routing of time-critical traffic and does not leave that in the hands
  186. of the FDB, flooding etc), the tc-gate action may never appear alone when
  187. asking sja1105 to offload it. One (or more) redirect or trap actions must also
  188. follow along.
  189. Example: create a tc-taprio schedule that is phase-aligned with a tc-gate
  190. schedule (the clocks must be synchronized by a 1588 application stack, which is
  191. outside the scope of this document). No packet delivered by the sender will be
  192. dropped. Note that the reception window is larger than the transmission window
  193. (and much more so, in this example) to compensate for the packet propagation
  194. delay of the link (which can be determined by the 1588 application stack).
  195. Receiver (sja1105)::
  196. tc qdisc add dev swp2 clsact
  197. now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
  198. sec=$(echo $now | awk -F. '{print $1}') && \
  199. base_time="$(((sec + 2) * 1000000000))" && \
  200. echo "base time ${base_time}"
  201. tc filter add dev swp2 ingress flower skip_sw \
  202. dst_mac 42:be:24:9b:76:20 \
  203. action gate base-time ${base_time} \
  204. sched-entry OPEN 60000 -1 -1 \
  205. sched-entry CLOSE 40000 -1 -1 \
  206. action trap
  207. Sender::
  208. now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
  209. sec=$(echo $now | awk -F. '{print $1}') && \
  210. base_time="$(((sec + 2) * 1000000000))" && \
  211. echo "base time ${base_time}"
  212. tc qdisc add dev eno0 parent root taprio \
  213. num_tc 8 \
  214. map 0 1 2 3 4 5 6 7 \
  215. queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
  216. base-time ${base_time} \
  217. sched-entry S 01 50000 \
  218. sched-entry S 00 50000 \
  219. flags 2
  220. The engine used to schedule the ingress gate operations is the same that the
  221. one used for the tc-taprio offload. Therefore, the restrictions regarding the
  222. fact that no two gate actions (either tc-gate or tc-taprio gates) may fire at
  223. the same time (during the same 200 ns slot) still apply.
  224. To come in handy, it is possible to share time-triggered virtual links across
  225. more than 1 ingress port, via flow blocks. In this case, the restriction of
  226. firing at the same time does not apply because there is a single schedule in
  227. the system, that of the shared virtual link::
  228. tc qdisc add dev swp2 ingress_block 1 clsact
  229. tc qdisc add dev swp3 ingress_block 1 clsact
  230. tc filter add block 1 flower skip_sw dst_mac 42:be:24:9b:76:20 \
  231. action gate index 2 \
  232. base-time 0 \
  233. sched-entry OPEN 50000000 -1 -1 \
  234. sched-entry CLOSE 50000000 -1 -1 \
  235. action trap
  236. Hardware statistics for each flow are also available ("pkts" counts the number
  237. of dropped frames, which is a sum of frames dropped due to timing violations,
  238. lack of destination ports and MTU enforcement checks). Byte-level counters are
  239. not available.
  240. Limitations
  241. ===========
  242. The SJA1105 switch family always performs VLAN processing. When configured as
  243. VLAN-unaware, frames carry a different VLAN tag internally, depending on
  244. whether the port is standalone or under a VLAN-unaware bridge.
  245. The virtual link keys are always fixed at {MAC DA, VLAN ID, VLAN PCP}, but the
  246. driver asks for the VLAN ID and VLAN PCP when the port is under a VLAN-aware
  247. bridge. Otherwise, it fills in the VLAN ID and PCP automatically, based on
  248. whether the port is standalone or in a VLAN-unaware bridge, and accepts only
  249. "VLAN-unaware" tc-flower keys (MAC DA).
  250. The existing tc-flower keys that are offloaded using virtual links are no
  251. longer operational after one of the following happens:
  252. - port was standalone and joins a bridge (VLAN-aware or VLAN-unaware)
  253. - port is part of a bridge whose VLAN awareness state changes
  254. - port was part of a bridge and becomes standalone
  255. - port was standalone, but another port joins a VLAN-aware bridge and this
  256. changes the global VLAN awareness state of the bridge
  257. The driver cannot veto all these operations, and it cannot update/remove the
  258. existing tc-flower filters either. So for proper operation, the tc-flower
  259. filters should be installed only after the forwarding configuration of the port
  260. has been made, and removed by user space before making any changes to it.
  261. Device Tree bindings and board design
  262. =====================================
  263. This section references ``Documentation/devicetree/bindings/net/dsa/nxp,sja1105.yaml``
  264. and aims to showcase some potential switch caveats.
  265. RMII PHY role and out-of-band signaling
  266. ---------------------------------------
  267. In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by
  268. an external oscillator (but not by the PHY).
  269. But the spec is rather loose and devices go outside it in several ways.
  270. Some PHYs go against the spec and may provide an output pin where they source
  271. the 50 MHz clock themselves, in an attempt to be helpful.
  272. On the other hand, the SJA1105 is only binary configurable - when in the RMII
  273. MAC role it will also attempt to drive the clock signal. To prevent this from
  274. happening it must be put in RMII PHY role.
  275. But doing so has some unintended consequences.
  276. In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0].
  277. These are practically some extra code words (/J/ and /K/) sent prior to the
  278. preamble of each frame. The MAC does not have this out-of-band signaling
  279. mechanism defined by the RMII spec.
  280. So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the
  281. clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105
  282. emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to
  283. frame preambles, which the real PHY is not expected to understand. So the PHY
  284. simply encodes the extra symbols received from the SJA1105-as-PHY onto the
  285. 100Base-Tx wire.
  286. On the other side of the wire, some link partners might discard these extra
  287. symbols, while others might choke on them and discard the entire Ethernet
  288. frames that follow along. This looks like packet loss with some link partners
  289. but not with others.
  290. The take-away is that in RMII mode, the SJA1105 must be let to drive the
  291. reference clock if connected to a PHY.
  292. RGMII fixed-link and internal delays
  293. ------------------------------------
  294. As mentioned in the bindings document, the second generation of devices has
  295. tunable delay lines as part of the MAC, which can be used to establish the
  296. correct RGMII timing budget.
  297. When powered up, these can shift the Rx and Tx clocks with a phase difference
  298. between 73.8 and 101.7 degrees.
  299. The catch is that the delay lines need to lock onto a clock signal with a
  300. stable frequency. This means that there must be at least 2 microseconds of
  301. silence between the clock at the old vs at the new frequency. Otherwise the
  302. lock is lost and the delay lines must be reset (powered down and back up).
  303. In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25
  304. MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the
  305. AN process.
  306. In the situation where the switch port is connected through an RGMII fixed-link
  307. to a link partner whose link state life cycle is outside the control of Linux
  308. (such as a different SoC), then the delay lines would remain unlocked (and
  309. inactive) until there is manual intervention (ifdown/ifup on the switch port).
  310. The take-away is that in RGMII mode, the switch's internal delays are only
  311. reliable if the link partner never changes link speeds, or if it does, it does
  312. so in a way that is coordinated with the switch port (practically, both ends of
  313. the fixed-link are under control of the same Linux system).
  314. As to why would a fixed-link interface ever change link speeds: there are
  315. Ethernet controllers out there which come out of reset in 100 Mbps mode, and
  316. their driver inevitably needs to change the speed and clock frequency if it's
  317. required to work at gigabit.
  318. MDIO bus and PHY management
  319. ---------------------------
  320. The SJA1105 does not have an MDIO bus and does not perform in-band AN either.
  321. Therefore there is no link state notification coming from the switch device.
  322. A board would need to hook up the PHYs connected to the switch to any other
  323. MDIO bus available to Linux within the system (e.g. to the DSA conduit's MDIO
  324. bus). Link state management then works by the driver manually keeping in sync
  325. (over SPI commands) the MAC link speed with the settings negotiated by the PHY.
  326. By comparison, the SJA1110 supports an MDIO slave access point over which its
  327. internal 100base-T1 PHYs can be accessed from the host. This is, however, not
  328. used by the driver, instead the internal 100base-T1 and 100base-TX PHYs are
  329. accessed through SPI commands, modeled in Linux as virtual MDIO buses.
  330. The microcontroller attached to the SJA1110 port 0 also has an MDIO controller
  331. operating in master mode, however the driver does not support this either,
  332. since the microcontroller gets disabled when the Linux driver operates.
  333. Discrete PHYs connected to the switch ports should have their MDIO interface
  334. attached to an MDIO controller from the host system and not to the switch,
  335. similar to SJA1105.
  336. Port compatibility matrix
  337. -------------------------
  338. The SJA1105 port compatibility matrix is:
  339. ===== ============== ============== ==============
  340. Port SJA1105E/T SJA1105P/Q SJA1105R/S
  341. ===== ============== ============== ==============
  342. 0 xMII xMII xMII
  343. 1 xMII xMII xMII
  344. 2 xMII xMII xMII
  345. 3 xMII xMII xMII
  346. 4 xMII xMII SGMII
  347. ===== ============== ============== ==============
  348. The SJA1110 port compatibility matrix is:
  349. ===== ============== ============== ============== ==============
  350. Port SJA1110A SJA1110B SJA1110C SJA1110D
  351. ===== ============== ============== ============== ==============
  352. 0 RevMII (uC) RevMII (uC) RevMII (uC) RevMII (uC)
  353. 1 100base-TX 100base-TX 100base-TX
  354. or SGMII SGMII
  355. 2 xMII xMII xMII xMII
  356. or SGMII or SGMII
  357. 3 xMII xMII xMII
  358. or SGMII or SGMII SGMII
  359. or 2500base-X or 2500base-X or 2500base-X
  360. 4 SGMII SGMII SGMII SGMII
  361. or 2500base-X or 2500base-X or 2500base-X or 2500base-X
  362. 5 100base-T1 100base-T1 100base-T1 100base-T1
  363. 6 100base-T1 100base-T1 100base-T1 100base-T1
  364. 7 100base-T1 100base-T1 100base-T1 100base-T1
  365. 8 100base-T1 100base-T1 n/a n/a
  366. 9 100base-T1 100base-T1 n/a n/a
  367. 10 100base-T1 n/a n/a n/a
  368. ===== ============== ============== ============== ==============