whatisRCU.rst 52 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415
  1. .. _whatisrcu_doc:
  2. What is RCU? -- "Read, Copy, Update"
  3. ======================================
  4. Please note that the "What is RCU?" LWN series is an excellent place
  5. to start learning about RCU:
  6. | 1. What is RCU, Fundamentally? https://lwn.net/Articles/262464/
  7. | 2. What is RCU? Part 2: Usage https://lwn.net/Articles/263130/
  8. | 3. RCU part 3: the RCU API https://lwn.net/Articles/264090/
  9. | 4. The RCU API, 2010 Edition https://lwn.net/Articles/418853/
  10. | 2010 Big API Table https://lwn.net/Articles/419086/
  11. | 5. The RCU API, 2014 Edition https://lwn.net/Articles/609904/
  12. | 2014 Big API Table https://lwn.net/Articles/609973/
  13. | 6. The RCU API, 2019 Edition https://lwn.net/Articles/777036/
  14. | 2019 Big API Table https://lwn.net/Articles/777165/
  15. | 7. The RCU API, 2024 Edition https://lwn.net/Articles/988638/
  16. | 2024 Background Information https://lwn.net/Articles/988641/
  17. | 2024 Big API Table https://lwn.net/Articles/988666/
  18. For those preferring video:
  19. | 1. Unraveling RCU Mysteries: Fundamentals https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
  20. | 2. Unraveling RCU Mysteries: Additional Use Cases https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
  21. What is RCU?
  22. RCU is a synchronization mechanism that was added to the Linux kernel
  23. during the 2.5 development effort that is optimized for read-mostly
  24. situations. Although RCU is actually quite simple, making effective use
  25. of it requires you to think differently about your code. Another part
  26. of the problem is the mistaken assumption that there is "one true way" to
  27. describe and to use RCU. Instead, the experience has been that different
  28. people must take different paths to arrive at an understanding of RCU,
  29. depending on their experiences and use cases. This document provides
  30. several different paths, as follows:
  31. :ref:`1. RCU OVERVIEW <1_whatisRCU>`
  32. :ref:`2. WHAT IS RCU'S CORE API? <2_whatisRCU>`
  33. :ref:`3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
  34. :ref:`4. WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
  35. :ref:`5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
  36. :ref:`6. ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
  37. :ref:`7. ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
  38. :ref:`8. FULL LIST OF RCU APIs <8_whatisRCU>`
  39. :ref:`9. ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
  40. People who prefer starting with a conceptual overview should focus on
  41. Section 1, though most readers will profit by reading this section at
  42. some point. People who prefer to start with an API that they can then
  43. experiment with should focus on Section 2. People who prefer to start
  44. with example uses should focus on Sections 3 and 4. People who need to
  45. understand the RCU implementation should focus on Section 5, then dive
  46. into the kernel source code. People who reason best by analogy should
  47. focus on Section 6 and 7. Section 8 serves as an index to the docbook
  48. API documentation, and Section 9 is the traditional answer key.
  49. So, start with the section that makes the most sense to you and your
  50. preferred method of learning. If you need to know everything about
  51. everything, feel free to read the whole thing -- but if you are really
  52. that type of person, you have perused the source code and will therefore
  53. never need this document anyway. ;-)
  54. .. _1_whatisRCU:
  55. 1. RCU OVERVIEW
  56. ----------------
  57. The basic idea behind RCU is to split updates into "removal" and
  58. "reclamation" phases. The removal phase removes references to data items
  59. within a data structure (possibly by replacing them with references to
  60. new versions of these data items), and can run concurrently with readers.
  61. The reason that it is safe to run the removal phase concurrently with
  62. readers is the semantics of modern CPUs guarantee that readers will see
  63. either the old or the new version of the data structure rather than a
  64. partially updated reference. The reclamation phase does the work of reclaiming
  65. (e.g., freeing) the data items removed from the data structure during the
  66. removal phase. Because reclaiming data items can disrupt any readers
  67. concurrently referencing those data items, the reclamation phase must
  68. not start until readers no longer hold references to those data items.
  69. Splitting the update into removal and reclamation phases permits the
  70. updater to perform the removal phase immediately, and to defer the
  71. reclamation phase until all readers active during the removal phase have
  72. completed, either by blocking until they finish or by registering a
  73. callback that is invoked after they finish. Only readers that are active
  74. during the removal phase need be considered, because any reader starting
  75. after the removal phase will be unable to gain a reference to the removed
  76. data items, and therefore cannot be disrupted by the reclamation phase.
  77. So the typical RCU update sequence goes something like the following:
  78. a. Remove pointers to a data structure, so that subsequent
  79. readers cannot gain a reference to it.
  80. b. Wait for all previous readers to complete their RCU read-side
  81. critical sections.
  82. c. At this point, there cannot be any readers who hold references
  83. to the data structure, so it now may safely be reclaimed
  84. (e.g., kfree()d).
  85. Step (b) above is the key idea underlying RCU's deferred destruction.
  86. The ability to wait until all readers are done allows RCU readers to
  87. use much lighter-weight synchronization, in some cases, absolutely no
  88. synchronization at all. In contrast, in more conventional lock-based
  89. schemes, readers must use heavy-weight synchronization in order to
  90. prevent an updater from deleting the data structure out from under them.
  91. This is because lock-based updaters typically update data items in place,
  92. and must therefore exclude readers. In contrast, RCU-based updaters
  93. typically take advantage of the fact that writes to single aligned
  94. pointers are atomic on modern CPUs, allowing atomic insertion, removal,
  95. and replacement of data items in a linked structure without disrupting
  96. readers. Concurrent RCU readers can then continue accessing the old
  97. versions, and can dispense with the atomic operations, memory barriers,
  98. and communications cache misses that are so expensive on present-day
  99. SMP computer systems, even in absence of lock contention.
  100. In the three-step procedure shown above, the updater is performing both
  101. the removal and the reclamation step, but it is often helpful for an
  102. entirely different thread to do the reclamation, as is in fact the case
  103. in the Linux kernel's directory-entry cache (dcache). Even if the same
  104. thread performs both the update step (step (a) above) and the reclamation
  105. step (step (c) above), it is often helpful to think of them separately.
  106. For example, RCU readers and updaters need not communicate at all,
  107. but RCU provides implicit low-overhead communication between readers
  108. and reclaimers, namely, in step (b) above.
  109. So how the heck can a reclaimer tell when a reader is done, given
  110. that readers are not doing any sort of synchronization operations???
  111. Read on to learn about how RCU's API makes this easy.
  112. .. _2_whatisRCU:
  113. 2. WHAT IS RCU'S CORE API?
  114. ---------------------------
  115. The core RCU API is quite small:
  116. a. rcu_read_lock()
  117. b. rcu_read_unlock()
  118. c. synchronize_rcu() / call_rcu()
  119. d. rcu_assign_pointer()
  120. e. rcu_dereference()
  121. There are many other members of the RCU API, but the rest can be
  122. expressed in terms of these five, though most implementations instead
  123. express synchronize_rcu() in terms of the call_rcu() callback API.
  124. The five core RCU APIs are described below, the other 18 will be enumerated
  125. later. See the kernel docbook documentation for more info, or look directly
  126. at the function header comments.
  127. rcu_read_lock()
  128. ^^^^^^^^^^^^^^^
  129. void rcu_read_lock(void);
  130. This temporal primitive is used by a reader to inform the
  131. reclaimer that the reader is entering an RCU read-side critical
  132. section. It is illegal to block while in an RCU read-side
  133. critical section, though kernels built with CONFIG_PREEMPT_RCU
  134. can preempt RCU read-side critical sections. Any RCU-protected
  135. data structure accessed during an RCU read-side critical section
  136. is guaranteed to remain unreclaimed for the full duration of that
  137. critical section. Reference counts may be used in conjunction
  138. with RCU to maintain longer-term references to data structures.
  139. Note that anything that disables bottom halves, preemption,
  140. or interrupts also enters an RCU read-side critical section.
  141. Acquiring a spinlock also enters an RCU read-side critical
  142. sections, even for spinlocks that do not disable preemption,
  143. as is the case in kernels built with CONFIG_PREEMPT_RT=y.
  144. Sleeplocks do *not* enter RCU read-side critical sections.
  145. rcu_read_unlock()
  146. ^^^^^^^^^^^^^^^^^
  147. void rcu_read_unlock(void);
  148. This temporal primitives is used by a reader to inform the
  149. reclaimer that the reader is exiting an RCU read-side critical
  150. section. Anything that enables bottom halves, preemption,
  151. or interrupts also exits an RCU read-side critical section.
  152. Releasing a spinlock also exits an RCU read-side critical section.
  153. Note that RCU read-side critical sections may be nested and/or
  154. overlapping.
  155. synchronize_rcu()
  156. ^^^^^^^^^^^^^^^^^
  157. void synchronize_rcu(void);
  158. This temporal primitive marks the end of updater code and the
  159. beginning of reclaimer code. It does this by blocking until
  160. all pre-existing RCU read-side critical sections on all CPUs
  161. have completed. Note that synchronize_rcu() will **not**
  162. necessarily wait for any subsequent RCU read-side critical
  163. sections to complete. For example, consider the following
  164. sequence of events::
  165. CPU 0 CPU 1 CPU 2
  166. ----------------- ------------------------- ---------------
  167. 1. rcu_read_lock()
  168. 2. enters synchronize_rcu()
  169. 3. rcu_read_lock()
  170. 4. rcu_read_unlock()
  171. 5. exits synchronize_rcu()
  172. 6. rcu_read_unlock()
  173. To reiterate, synchronize_rcu() waits only for ongoing RCU
  174. read-side critical sections to complete, not necessarily for
  175. any that begin after synchronize_rcu() is invoked.
  176. Of course, synchronize_rcu() does not necessarily return
  177. **immediately** after the last pre-existing RCU read-side critical
  178. section completes. For one thing, there might well be scheduling
  179. delays. For another thing, many RCU implementations process
  180. requests in batches in order to improve efficiencies, which can
  181. further delay synchronize_rcu().
  182. Since synchronize_rcu() is the API that must figure out when
  183. readers are done, its implementation is key to RCU. For RCU
  184. to be useful in all but the most read-intensive situations,
  185. synchronize_rcu()'s overhead must also be quite small.
  186. The call_rcu() API is an asynchronous callback form of
  187. synchronize_rcu(), and is described in more detail in a later
  188. section. Instead of blocking, it registers a function and
  189. argument which are invoked after all ongoing RCU read-side
  190. critical sections have completed. This callback variant is
  191. particularly useful in situations where it is illegal to block
  192. or where update-side performance is critically important.
  193. However, the call_rcu() API should not be used lightly, as use
  194. of the synchronize_rcu() API generally results in simpler code.
  195. In addition, the synchronize_rcu() API has the nice property
  196. of automatically limiting update rate should grace periods
  197. be delayed. This property results in system resilience in face
  198. of denial-of-service attacks. Code using call_rcu() should limit
  199. update rate in order to gain this same sort of resilience. See
  200. checklist.rst for some approaches to limiting the update rate.
  201. rcu_assign_pointer()
  202. ^^^^^^^^^^^^^^^^^^^^
  203. void rcu_assign_pointer(p, typeof(p) v);
  204. Yes, rcu_assign_pointer() **is** implemented as a macro, though
  205. it would be cool to be able to declare a function in this manner.
  206. (And there has been some discussion of adding overloaded functions
  207. to the C language, so who knows?)
  208. The updater uses this spatial macro to assign a new value to an
  209. RCU-protected pointer, in order to safely communicate the change
  210. in value from the updater to the reader. This is a spatial (as
  211. opposed to temporal) macro. It does not evaluate to an rvalue,
  212. but it does provide any compiler directives and memory-barrier
  213. instructions required for a given compile or CPU architecture.
  214. Its ordering properties are that of a store-release operation,
  215. that is, any prior loads and stores required to initialize the
  216. structure are ordered before the store that publishes the pointer
  217. to that structure.
  218. Perhaps just as important, rcu_assign_pointer() serves to document
  219. (1) which pointers are protected by RCU and (2) the point at which
  220. a given structure becomes accessible to other CPUs. That said,
  221. rcu_assign_pointer() is most frequently used indirectly, via
  222. the _rcu list-manipulation primitives such as list_add_rcu().
  223. rcu_dereference()
  224. ^^^^^^^^^^^^^^^^^
  225. typeof(p) rcu_dereference(p);
  226. Like rcu_assign_pointer(), rcu_dereference() must be implemented
  227. as a macro.
  228. The reader uses the spatial rcu_dereference() macro to fetch
  229. an RCU-protected pointer, which returns a value that may
  230. then be safely dereferenced. Note that rcu_dereference()
  231. does not actually dereference the pointer, instead, it
  232. protects the pointer for later dereferencing. It also
  233. executes any needed memory-barrier instructions for a given
  234. CPU architecture. Currently, only Alpha needs memory barriers
  235. within rcu_dereference() -- on other CPUs, it compiles to a
  236. volatile load. However, no mainstream C compilers respect
  237. address dependencies, so rcu_dereference() uses volatile casts,
  238. which, in combination with the coding guidelines listed in
  239. rcu_dereference.rst, prevent current compilers from breaking
  240. these dependencies.
  241. Common coding practice uses rcu_dereference() to copy an
  242. RCU-protected pointer to a local variable, then dereferences
  243. this local variable, for example as follows::
  244. p = rcu_dereference(head.next);
  245. return p->data;
  246. However, in this case, one could just as easily combine these
  247. into one statement::
  248. return rcu_dereference(head.next)->data;
  249. If you are going to be fetching multiple fields from the
  250. RCU-protected structure, using the local variable is of
  251. course preferred. Repeated rcu_dereference() calls look
  252. ugly, do not guarantee that the same pointer will be returned
  253. if an update happened while in the critical section, and incur
  254. unnecessary overhead on Alpha CPUs.
  255. Note that the value returned by rcu_dereference() is valid
  256. only within the enclosing RCU read-side critical section [1]_.
  257. For example, the following is **not** legal::
  258. rcu_read_lock();
  259. p = rcu_dereference(head.next);
  260. rcu_read_unlock();
  261. x = p->address; /* BUG!!! */
  262. rcu_read_lock();
  263. y = p->data; /* BUG!!! */
  264. rcu_read_unlock();
  265. Holding a reference from one RCU read-side critical section
  266. to another is just as illegal as holding a reference from
  267. one lock-based critical section to another! Similarly,
  268. using a reference outside of the critical section in which
  269. it was acquired is just as illegal as doing so with normal
  270. locking.
  271. As with rcu_assign_pointer(), an important function of
  272. rcu_dereference() is to document which pointers are protected by
  273. RCU, in particular, flagging a pointer that is subject to changing
  274. at any time, including immediately after the rcu_dereference().
  275. And, again like rcu_assign_pointer(), rcu_dereference() is
  276. typically used indirectly, via the _rcu list-manipulation
  277. primitives, such as list_for_each_entry_rcu() [2]_.
  278. .. [1] The variant rcu_dereference_protected() can be used outside
  279. of an RCU read-side critical section as long as the usage is
  280. protected by locks acquired by the update-side code. This variant
  281. avoids the lockdep warning that would happen when using (for
  282. example) rcu_dereference() without rcu_read_lock() protection.
  283. Using rcu_dereference_protected() also has the advantage
  284. of permitting compiler optimizations that rcu_dereference()
  285. must prohibit. The rcu_dereference_protected() variant takes
  286. a lockdep expression to indicate which locks must be acquired
  287. by the caller. If the indicated protection is not provided,
  288. a lockdep splat is emitted. See Design/Requirements/Requirements.rst
  289. and the API's code comments for more details and example usage.
  290. .. [2] If the list_for_each_entry_rcu() instance might be used by
  291. update-side code as well as by RCU readers, then an additional
  292. lockdep expression can be added to its list of arguments.
  293. For example, given an additional "lock_is_held(&mylock)" argument,
  294. the RCU lockdep code would complain only if this instance was
  295. invoked outside of an RCU read-side critical section and without
  296. the protection of mylock.
  297. The following diagram shows how each API communicates among the
  298. reader, updater, and reclaimer.
  299. ::
  300. rcu_assign_pointer()
  301. +--------+
  302. +---------------------->| reader |---------+
  303. | +--------+ |
  304. | | |
  305. | | | Protect:
  306. | | | rcu_read_lock()
  307. | | | rcu_read_unlock()
  308. | rcu_dereference() | |
  309. +---------+ | |
  310. | updater |<----------------+ |
  311. +---------+ V
  312. | +-----------+
  313. +----------------------------------->| reclaimer |
  314. +-----------+
  315. Defer:
  316. synchronize_rcu() & call_rcu()
  317. The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
  318. rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
  319. order to determine when (1) synchronize_rcu() invocations may return
  320. to their callers and (2) call_rcu() callbacks may be invoked. Efficient
  321. implementations of the RCU infrastructure make heavy use of batching in
  322. order to amortize their overhead over many uses of the corresponding APIs.
  323. The rcu_assign_pointer() and rcu_dereference() invocations communicate
  324. spatial changes via stores to and loads from the RCU-protected pointer in
  325. question.
  326. There are at least three flavors of RCU usage in the Linux kernel. The diagram
  327. above shows the most common one. On the updater side, the rcu_assign_pointer(),
  328. synchronize_rcu() and call_rcu() primitives used are the same for all three
  329. flavors. However for protection (on the reader side), the primitives used vary
  330. depending on the flavor:
  331. a. rcu_read_lock() / rcu_read_unlock()
  332. rcu_dereference()
  333. b. rcu_read_lock_bh() / rcu_read_unlock_bh()
  334. local_bh_disable() / local_bh_enable()
  335. rcu_dereference_bh()
  336. c. rcu_read_lock_sched() / rcu_read_unlock_sched()
  337. preempt_disable() / preempt_enable()
  338. local_irq_save() / local_irq_restore()
  339. hardirq enter / hardirq exit
  340. NMI enter / NMI exit
  341. rcu_dereference_sched()
  342. These three flavors are used as follows:
  343. a. RCU applied to normal data structures.
  344. b. RCU applied to networking data structures that may be subjected
  345. to remote denial-of-service attacks.
  346. c. RCU applied to scheduler and interrupt/NMI-handler tasks.
  347. Again, most uses will be of (a). The (b) and (c) cases are important
  348. for specialized uses, but are relatively uncommon. The SRCU, RCU-Tasks,
  349. RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
  350. their assorted primitives.
  351. .. _3_whatisRCU:
  352. 3. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
  353. -----------------------------------------------
  354. This section shows a simple use of the core RCU API to protect a
  355. global pointer to a dynamically allocated structure. More-typical
  356. uses of RCU may be found in listRCU.rst and NMI-RCU.rst.
  357. ::
  358. struct foo {
  359. int a;
  360. char b;
  361. long c;
  362. };
  363. DEFINE_SPINLOCK(foo_mutex);
  364. struct foo __rcu *gbl_foo;
  365. /*
  366. * Create a new struct foo that is the same as the one currently
  367. * pointed to by gbl_foo, except that field "a" is replaced
  368. * with "new_a". Points gbl_foo to the new structure, and
  369. * frees up the old structure after a grace period.
  370. *
  371. * Uses rcu_assign_pointer() to ensure that concurrent readers
  372. * see the initialized version of the new structure.
  373. *
  374. * Uses synchronize_rcu() to ensure that any readers that might
  375. * have references to the old structure complete before freeing
  376. * the old structure.
  377. */
  378. void foo_update_a(int new_a)
  379. {
  380. struct foo *new_fp;
  381. struct foo *old_fp;
  382. new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
  383. spin_lock(&foo_mutex);
  384. old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
  385. *new_fp = *old_fp;
  386. new_fp->a = new_a;
  387. rcu_assign_pointer(gbl_foo, new_fp);
  388. spin_unlock(&foo_mutex);
  389. synchronize_rcu();
  390. kfree(old_fp);
  391. }
  392. /*
  393. * Return the value of field "a" of the current gbl_foo
  394. * structure. Use rcu_read_lock() and rcu_read_unlock()
  395. * to ensure that the structure does not get deleted out
  396. * from under us, and use rcu_dereference() to ensure that
  397. * we see the initialized version of the structure (important
  398. * for DEC Alpha and for people reading the code).
  399. */
  400. int foo_get_a(void)
  401. {
  402. int retval;
  403. rcu_read_lock();
  404. retval = rcu_dereference(gbl_foo)->a;
  405. rcu_read_unlock();
  406. return retval;
  407. }
  408. So, to sum up:
  409. - Use rcu_read_lock() and rcu_read_unlock() to guard RCU
  410. read-side critical sections.
  411. - Within an RCU read-side critical section, use rcu_dereference()
  412. to dereference RCU-protected pointers.
  413. - Use some solid design (such as locks or semaphores) to
  414. keep concurrent updates from interfering with each other.
  415. - Use rcu_assign_pointer() to update an RCU-protected pointer.
  416. This primitive protects concurrent readers from the updater,
  417. **not** concurrent updates from each other! You therefore still
  418. need to use locking (or something similar) to keep concurrent
  419. rcu_assign_pointer() primitives from interfering with each other.
  420. - Use synchronize_rcu() **after** removing a data element from an
  421. RCU-protected data structure, but **before** reclaiming/freeing
  422. the data element, in order to wait for the completion of all
  423. RCU read-side critical sections that might be referencing that
  424. data item.
  425. See checklist.rst for additional rules to follow when using RCU.
  426. And again, more-typical uses of RCU may be found in listRCU.rst
  427. and NMI-RCU.rst.
  428. .. _4_whatisRCU:
  429. 4. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
  430. --------------------------------------------
  431. In the example above, foo_update_a() blocks until a grace period elapses.
  432. This is quite simple, but in some cases one cannot afford to wait so
  433. long -- there might be other high-priority work to be done.
  434. In such cases, one uses call_rcu() rather than synchronize_rcu().
  435. The call_rcu() API is as follows::
  436. void call_rcu(struct rcu_head *head, rcu_callback_t func);
  437. This function invokes func(head) after a grace period has elapsed.
  438. This invocation might happen from either softirq or process context,
  439. so the function is not permitted to block. The foo struct needs to
  440. have an rcu_head structure added, perhaps as follows::
  441. struct foo {
  442. int a;
  443. char b;
  444. long c;
  445. struct rcu_head rcu;
  446. };
  447. The foo_update_a() function might then be written as follows::
  448. /*
  449. * Create a new struct foo that is the same as the one currently
  450. * pointed to by gbl_foo, except that field "a" is replaced
  451. * with "new_a". Points gbl_foo to the new structure, and
  452. * frees up the old structure after a grace period.
  453. *
  454. * Uses rcu_assign_pointer() to ensure that concurrent readers
  455. * see the initialized version of the new structure.
  456. *
  457. * Uses call_rcu() to ensure that any readers that might have
  458. * references to the old structure complete before freeing the
  459. * old structure.
  460. */
  461. void foo_update_a(int new_a)
  462. {
  463. struct foo *new_fp;
  464. struct foo *old_fp;
  465. new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
  466. spin_lock(&foo_mutex);
  467. old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
  468. *new_fp = *old_fp;
  469. new_fp->a = new_a;
  470. rcu_assign_pointer(gbl_foo, new_fp);
  471. spin_unlock(&foo_mutex);
  472. call_rcu(&old_fp->rcu, foo_reclaim);
  473. }
  474. The foo_reclaim() function might appear as follows::
  475. void foo_reclaim(struct rcu_head *rp)
  476. {
  477. struct foo *fp = container_of(rp, struct foo, rcu);
  478. foo_cleanup(fp->a);
  479. kfree(fp);
  480. }
  481. The container_of() primitive is a macro that, given a pointer into a
  482. struct, the type of the struct, and the pointed-to field within the
  483. struct, returns a pointer to the beginning of the struct.
  484. The use of call_rcu() permits the caller of foo_update_a() to
  485. immediately regain control, without needing to worry further about the
  486. old version of the newly updated element. It also clearly shows the
  487. RCU distinction between updater, namely foo_update_a(), and reclaimer,
  488. namely foo_reclaim().
  489. The summary of advice is the same as for the previous section, except
  490. that we are now using call_rcu() rather than synchronize_rcu():
  491. - Use call_rcu() **after** removing a data element from an
  492. RCU-protected data structure in order to register a callback
  493. function that will be invoked after the completion of all RCU
  494. read-side critical sections that might be referencing that
  495. data item.
  496. If the callback for call_rcu() is not doing anything more than calling
  497. kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
  498. to avoid having to write your own callback::
  499. kfree_rcu(old_fp, rcu);
  500. If the occasional sleep is permitted, the single-argument form may
  501. be used, omitting the rcu_head structure from struct foo.
  502. kfree_rcu_mightsleep(old_fp);
  503. This variant almost never blocks, but might do so by invoking
  504. synchronize_rcu() in response to memory-allocation failure.
  505. Again, see checklist.rst for additional rules governing the use of RCU.
  506. .. _5_whatisRCU:
  507. 5. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
  508. ------------------------------------------------
  509. One of the nice things about RCU is that it has extremely simple "toy"
  510. implementations that are a good first step towards understanding the
  511. production-quality implementations in the Linux kernel. This section
  512. presents two such "toy" implementations of RCU, one that is implemented
  513. in terms of familiar locking primitives, and another that more closely
  514. resembles "classic" RCU. Both are way too simple for real-world use,
  515. lacking both functionality and performance. However, they are useful
  516. in getting a feel for how RCU works. See kernel/rcu/update.c for a
  517. production-quality implementation, and see:
  518. https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
  519. for papers describing the Linux kernel RCU implementation. The OLS'01
  520. and OLS'02 papers are a good introduction, and the dissertation provides
  521. more details on the current implementation as of early 2004.
  522. 5A. "TOY" IMPLEMENTATION #1: LOCKING
  523. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  524. This section presents a "toy" RCU implementation that is based on
  525. familiar locking primitives. Its overhead makes it a non-starter for
  526. real-life use, as does its lack of scalability. It is also unsuitable
  527. for realtime use, since it allows scheduling latency to "bleed" from
  528. one read-side critical section to another. It also assumes recursive
  529. reader-writer locks: If you try this with non-recursive locks, and
  530. you allow nested rcu_read_lock() calls, you can deadlock.
  531. However, it is probably the easiest implementation to relate to, so is
  532. a good starting point.
  533. It is extremely simple::
  534. static DEFINE_RWLOCK(rcu_gp_mutex);
  535. void rcu_read_lock(void)
  536. {
  537. read_lock(&rcu_gp_mutex);
  538. }
  539. void rcu_read_unlock(void)
  540. {
  541. read_unlock(&rcu_gp_mutex);
  542. }
  543. void synchronize_rcu(void)
  544. {
  545. write_lock(&rcu_gp_mutex);
  546. smp_mb__after_spinlock();
  547. write_unlock(&rcu_gp_mutex);
  548. }
  549. [You can ignore rcu_assign_pointer() and rcu_dereference() without missing
  550. much. But here are simplified versions anyway. And whatever you do,
  551. don't forget about them when submitting patches making use of RCU!]::
  552. #define rcu_assign_pointer(p, v) \
  553. ({ \
  554. smp_store_release(&(p), (v)); \
  555. })
  556. #define rcu_dereference(p) \
  557. ({ \
  558. typeof(p) _________p1 = READ_ONCE(p); \
  559. (_________p1); \
  560. })
  561. The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
  562. and release a global reader-writer lock. The synchronize_rcu()
  563. primitive write-acquires this same lock, then releases it. This means
  564. that once synchronize_rcu() exits, all RCU read-side critical sections
  565. that were in progress before synchronize_rcu() was called are guaranteed
  566. to have completed -- there is no way that synchronize_rcu() would have
  567. been able to write-acquire the lock otherwise. The smp_mb__after_spinlock()
  568. promotes synchronize_rcu() to a full memory barrier in compliance with
  569. the "Memory-Barrier Guarantees" listed in:
  570. Design/Requirements/Requirements.rst
  571. It is possible to nest rcu_read_lock(), since reader-writer locks may
  572. be recursively acquired. Note also that rcu_read_lock() is immune
  573. from deadlock (an important property of RCU). The reason for this is
  574. that the only thing that can block rcu_read_lock() is a synchronize_rcu().
  575. But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
  576. so there can be no deadlock cycle.
  577. .. _quiz_1:
  578. Quick Quiz #1:
  579. Why is this argument naive? How could a deadlock
  580. occur when using this algorithm in a real-world Linux
  581. kernel? How could this deadlock be avoided?
  582. :ref:`Answers to Quick Quiz <9_whatisRCU>`
  583. 5B. "TOY" EXAMPLE #2: CLASSIC RCU
  584. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  585. This section presents a "toy" RCU implementation that is based on
  586. "classic RCU". It is also short on performance (but only for updates) and
  587. on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
  588. kernels. The definitions of rcu_dereference() and rcu_assign_pointer()
  589. are the same as those shown in the preceding section, so they are omitted.
  590. ::
  591. void rcu_read_lock(void) { }
  592. void rcu_read_unlock(void) { }
  593. void synchronize_rcu(void)
  594. {
  595. int cpu;
  596. for_each_possible_cpu(cpu)
  597. run_on(cpu);
  598. }
  599. Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
  600. This is the great strength of classic RCU in a non-preemptive kernel:
  601. read-side overhead is precisely zero, at least on non-Alpha CPUs.
  602. And there is absolutely no way that rcu_read_lock() can possibly
  603. participate in a deadlock cycle!
  604. The implementation of synchronize_rcu() simply schedules itself on each
  605. CPU in turn. The run_on() primitive can be implemented straightforwardly
  606. in terms of the sched_setaffinity() primitive. Of course, a somewhat less
  607. "toy" implementation would restore the affinity upon completion rather
  608. than just leaving all tasks running on the last CPU, but when I said
  609. "toy", I meant **toy**!
  610. So how the heck is this supposed to work???
  611. Remember that it is illegal to block while in an RCU read-side critical
  612. section. Therefore, if a given CPU executes a context switch, we know
  613. that it must have completed all preceding RCU read-side critical sections.
  614. Once **all** CPUs have executed a context switch, then **all** preceding
  615. RCU read-side critical sections will have completed.
  616. So, suppose that we remove a data item from its structure and then invoke
  617. synchronize_rcu(). Once synchronize_rcu() returns, we are guaranteed
  618. that there are no RCU read-side critical sections holding a reference
  619. to that data item, so we can safely reclaim it.
  620. .. _quiz_2:
  621. Quick Quiz #2:
  622. Give an example where Classic RCU's read-side
  623. overhead is **negative**.
  624. :ref:`Answers to Quick Quiz <9_whatisRCU>`
  625. .. _quiz_3:
  626. Quick Quiz #3:
  627. If it is illegal to block in an RCU read-side
  628. critical section, what the heck do you do in
  629. CONFIG_PREEMPT_RT, where normal spinlocks can block???
  630. :ref:`Answers to Quick Quiz <9_whatisRCU>`
  631. .. _6_whatisRCU:
  632. 6. ANALOGY WITH READER-WRITER LOCKING
  633. --------------------------------------
  634. Although RCU can be used in many different ways, a very common use of
  635. RCU is analogous to reader-writer locking. The following unified
  636. diff shows how closely related RCU and reader-writer locking can be.
  637. ::
  638. @@ -5,5 +5,5 @@ struct el {
  639. int data;
  640. /* Other data fields */
  641. };
  642. -rwlock_t listmutex;
  643. +spinlock_t listmutex;
  644. struct el head;
  645. @@ -13,15 +14,15 @@
  646. struct list_head *lp;
  647. struct el *p;
  648. - read_lock(&listmutex);
  649. - list_for_each_entry(p, head, lp) {
  650. + rcu_read_lock();
  651. + list_for_each_entry_rcu(p, head, lp) {
  652. if (p->key == key) {
  653. *result = p->data;
  654. - read_unlock(&listmutex);
  655. + rcu_read_unlock();
  656. return 1;
  657. }
  658. }
  659. - read_unlock(&listmutex);
  660. + rcu_read_unlock();
  661. return 0;
  662. }
  663. @@ -29,15 +30,16 @@
  664. {
  665. struct el *p;
  666. - write_lock(&listmutex);
  667. + spin_lock(&listmutex);
  668. list_for_each_entry(p, head, lp) {
  669. if (p->key == key) {
  670. - list_del(&p->list);
  671. - write_unlock(&listmutex);
  672. + list_del_rcu(&p->list);
  673. + spin_unlock(&listmutex);
  674. + synchronize_rcu();
  675. kfree(p);
  676. return 1;
  677. }
  678. }
  679. - write_unlock(&listmutex);
  680. + spin_unlock(&listmutex);
  681. return 0;
  682. }
  683. Or, for those who prefer a side-by-side listing::
  684. 1 struct el { 1 struct el {
  685. 2 struct list_head list; 2 struct list_head list;
  686. 3 long key; 3 long key;
  687. 4 spinlock_t mutex; 4 spinlock_t mutex;
  688. 5 int data; 5 int data;
  689. 6 /* Other data fields */ 6 /* Other data fields */
  690. 7 }; 7 };
  691. 8 rwlock_t listmutex; 8 spinlock_t listmutex;
  692. 9 struct el head; 9 struct el head;
  693. ::
  694. 1 int search(long key, int *result) 1 int search(long key, int *result)
  695. 2 { 2 {
  696. 3 struct list_head *lp; 3 struct list_head *lp;
  697. 4 struct el *p; 4 struct el *p;
  698. 5 5
  699. 6 read_lock(&listmutex); 6 rcu_read_lock();
  700. 7 list_for_each_entry(p, head, lp) { 7 list_for_each_entry_rcu(p, head, lp) {
  701. 8 if (p->key == key) { 8 if (p->key == key) {
  702. 9 *result = p->data; 9 *result = p->data;
  703. 10 read_unlock(&listmutex); 10 rcu_read_unlock();
  704. 11 return 1; 11 return 1;
  705. 12 } 12 }
  706. 13 } 13 }
  707. 14 read_unlock(&listmutex); 14 rcu_read_unlock();
  708. 15 return 0; 15 return 0;
  709. 16 } 16 }
  710. ::
  711. 1 int delete(long key) 1 int delete(long key)
  712. 2 { 2 {
  713. 3 struct el *p; 3 struct el *p;
  714. 4 4
  715. 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
  716. 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
  717. 7 if (p->key == key) { 7 if (p->key == key) {
  718. 8 list_del(&p->list); 8 list_del_rcu(&p->list);
  719. 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
  720. 10 synchronize_rcu();
  721. 10 kfree(p); 11 kfree(p);
  722. 11 return 1; 12 return 1;
  723. 12 } 13 }
  724. 13 } 14 }
  725. 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex);
  726. 15 return 0; 16 return 0;
  727. 16 } 17 }
  728. Either way, the differences are quite small. Read-side locking moves
  729. to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
  730. a reader-writer lock to a simple spinlock, and a synchronize_rcu()
  731. precedes the kfree().
  732. However, there is one potential catch: the read-side and update-side
  733. critical sections can now run concurrently. In many cases, this will
  734. not be a problem, but it is necessary to check carefully regardless.
  735. For example, if multiple independent list updates must be seen as
  736. a single atomic update, converting to RCU will require special care.
  737. Also, the presence of synchronize_rcu() means that the RCU version of
  738. delete() can now block. If this is a problem, there is a callback-based
  739. mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
  740. be used in place of synchronize_rcu().
  741. .. _7_whatisRCU:
  742. 7. ANALOGY WITH REFERENCE COUNTING
  743. -----------------------------------
  744. The reader-writer analogy (illustrated by the previous section) is not
  745. always the best way to think about using RCU. Another helpful analogy
  746. considers RCU an effective reference count on everything which is
  747. protected by RCU.
  748. A reference count typically does not prevent the referenced object's
  749. values from changing, but does prevent changes to type -- particularly the
  750. gross change of type that happens when that object's memory is freed and
  751. re-allocated for some other purpose. Once a type-safe reference to the
  752. object is obtained, some other mechanism is needed to ensure consistent
  753. access to the data in the object. This could involve taking a spinlock,
  754. but with RCU the typical approach is to perform reads with SMP-aware
  755. operations such as smp_load_acquire(), to perform updates with atomic
  756. read-modify-write operations, and to provide the necessary ordering.
  757. RCU provides a number of support functions that embed the required
  758. operations and ordering, such as the list_for_each_entry_rcu() macro
  759. used in the previous section.
  760. A more focused view of the reference counting behavior is that,
  761. between rcu_read_lock() and rcu_read_unlock(), any reference taken with
  762. rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
  763. though a reference-count on that object has been temporarily increased.
  764. This prevents the object from changing type. Exactly what this means
  765. will depend on normal expectations of objects of that type, but it
  766. typically includes that spinlocks can still be safely locked, normal
  767. reference counters can be safely manipulated, and ``__rcu`` pointers
  768. can be safely dereferenced.
  769. Some operations that one might expect to see on an object for
  770. which an RCU reference is held include:
  771. - Copying out data that is guaranteed to be stable by the object's type.
  772. - Using kref_get_unless_zero() or similar to get a longer-term
  773. reference. This may fail of course.
  774. - Acquiring a spinlock in the object, and checking if the object still
  775. is the expected object and if so, manipulating it freely.
  776. The understanding that RCU provides a reference that only prevents a
  777. change of type is particularly visible with objects allocated from a
  778. slab cache marked ``SLAB_TYPESAFE_BY_RCU``. RCU operations may yield a
  779. reference to an object from such a cache that has been concurrently freed
  780. and the memory reallocated to a completely different object, though of
  781. the same type. In this case RCU doesn't even protect the identity of the
  782. object from changing, only its type. So the object found may not be the
  783. one expected, but it will be one where it is safe to take a reference
  784. (and then potentially acquiring a spinlock), allowing subsequent code
  785. to check whether the identity matches expectations. It is tempting
  786. to simply acquire the spinlock without first taking the reference, but
  787. unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
  788. initialized after each and every call to kmem_cache_alloc(), which renders
  789. reference-free spinlock acquisition completely unsafe. Therefore, when
  790. using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
  791. If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire()
  792. and refcount_set_release() APIs should be used to ensure correct operation
  793. ordering when verifying object identity and when initializing newly
  794. allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire()
  795. ensures that identity checks happen *after* reference count is taken.
  796. refcount_set_release() should be called after a newly allocated object is
  797. fully initialized and release fence ensures that new values are visible
  798. *before* refcount can be successfully taken by other users. Once
  799. refcount_set_release() is called, the object should be considered visible
  800. by other tasks.
  801. (Those willing to initialize their locks in a kmem_cache constructor
  802. may also use locking, including cache-friendly sequence locking.)
  803. With traditional reference counting -- such as that implemented by the
  804. kref library in Linux -- there is typically code that runs when the last
  805. reference to an object is dropped. With kref, this is the function
  806. passed to kref_put(). When RCU is being used, such finalization code
  807. must not be run until all ``__rcu`` pointers referencing the object have
  808. been updated, and then a grace period has passed. Every remaining
  809. globally visible pointer to the object must be considered to be a
  810. potential counted reference, and the finalization code is typically run
  811. using call_rcu() only after all those pointers have been changed.
  812. To see how to choose between these two analogies -- of RCU as a
  813. reader-writer lock and RCU as a reference counting system -- it is useful
  814. to reflect on the scale of the thing being protected. The reader-writer
  815. lock analogy looks at larger multi-part objects such as a linked list
  816. and shows how RCU can facilitate concurrency while elements are added
  817. to, and removed from, the list. The reference-count analogy looks at
  818. the individual objects and looks at how they can be accessed safely
  819. within whatever whole they are a part of.
  820. .. _8_whatisRCU:
  821. 8. FULL LIST OF RCU APIs
  822. -------------------------
  823. The RCU APIs are documented in docbook-format header comments in the
  824. Linux-kernel source code, but it helps to have a full list of the
  825. APIs, since there does not appear to be a way to categorize them
  826. in docbook. Here is the list, by category.
  827. RCU list traversal::
  828. list_entry_rcu
  829. list_entry_lockless
  830. list_first_entry_rcu
  831. list_first_or_null_rcu
  832. list_tail_rcu
  833. list_next_rcu
  834. list_next_or_null_rcu
  835. list_for_each_entry_rcu
  836. list_for_each_entry_continue_rcu
  837. list_for_each_entry_from_rcu
  838. list_for_each_entry_lockless
  839. hlist_first_rcu
  840. hlist_next_rcu
  841. hlist_pprev_rcu
  842. hlist_for_each_entry_rcu
  843. hlist_for_each_entry_rcu_notrace
  844. hlist_for_each_entry_rcu_bh
  845. hlist_for_each_entry_from_rcu
  846. hlist_for_each_entry_continue_rcu
  847. hlist_for_each_entry_continue_rcu_bh
  848. hlist_nulls_first_rcu
  849. hlist_nulls_next_rcu
  850. hlist_nulls_for_each_entry_rcu
  851. hlist_nulls_for_each_entry_safe
  852. hlist_bl_first_rcu
  853. hlist_bl_for_each_entry_rcu
  854. RCU pointer/list update::
  855. rcu_assign_pointer
  856. rcu_replace_pointer
  857. INIT_LIST_HEAD_RCU
  858. list_add_rcu
  859. list_add_tail_rcu
  860. list_del_rcu
  861. list_replace_rcu
  862. list_splice_init_rcu
  863. list_splice_tail_init_rcu
  864. hlist_add_behind_rcu
  865. hlist_add_before_rcu
  866. hlist_add_head_rcu
  867. hlist_add_tail_rcu
  868. hlist_del_rcu
  869. hlist_del_init_rcu
  870. hlist_replace_rcu
  871. hlist_nulls_del_init_rcu
  872. hlist_nulls_del_rcu
  873. hlist_nulls_add_head_rcu
  874. hlist_nulls_add_tail_rcu
  875. hlist_nulls_add_fake
  876. hlists_swap_heads_rcu
  877. hlist_bl_add_head_rcu
  878. hlist_bl_del_rcu
  879. hlist_bl_set_first_rcu
  880. RCU::
  881. Critical sections Grace period Barrier
  882. rcu_read_lock synchronize_net rcu_barrier
  883. rcu_read_unlock synchronize_rcu
  884. guard(rcu)() synchronize_rcu_expedited
  885. scoped_guard(rcu) synchronize_rcu_mult
  886. rcu_dereference call_rcu
  887. rcu_dereference_check call_rcu_hurry
  888. rcu_dereference_protected kfree_rcu
  889. rcu_read_lock_held kvfree_rcu
  890. rcu_read_lock_any_held kfree_rcu_mightsleep
  891. rcu_pointer_handoff cond_synchronize_rcu
  892. unrcu_pointer cond_synchronize_rcu_full
  893. cond_synchronize_rcu_expedited
  894. cond_synchronize_rcu_expedited_full
  895. get_completed_synchronize_rcu
  896. get_completed_synchronize_rcu_full
  897. get_state_synchronize_rcu
  898. get_state_synchronize_rcu_full
  899. poll_state_synchronize_rcu
  900. poll_state_synchronize_rcu_full
  901. same_state_synchronize_rcu
  902. same_state_synchronize_rcu_full
  903. start_poll_synchronize_rcu
  904. start_poll_synchronize_rcu_full
  905. start_poll_synchronize_rcu_expedited
  906. start_poll_synchronize_rcu_expedited_full
  907. bh::
  908. Critical sections Grace period Barrier
  909. rcu_read_lock_bh [Same as RCU] [Same as RCU]
  910. rcu_read_unlock_bh
  911. [local_bh_disable]
  912. [and friends]
  913. rcu_dereference_bh
  914. rcu_dereference_bh_check
  915. rcu_dereference_bh_protected
  916. rcu_read_lock_bh_held
  917. sched::
  918. Critical sections Grace period Barrier
  919. rcu_read_lock_sched [Same as RCU] [Same as RCU]
  920. rcu_read_unlock_sched
  921. [preempt_disable]
  922. [and friends]
  923. rcu_read_lock_sched_notrace
  924. rcu_read_unlock_sched_notrace
  925. rcu_dereference_sched
  926. rcu_dereference_sched_check
  927. rcu_dereference_sched_protected
  928. rcu_read_lock_sched_held
  929. RCU: Initialization/cleanup/ordering::
  930. RCU_INIT_POINTER
  931. RCU_INITIALIZER
  932. RCU_POINTER_INITIALIZER
  933. init_rcu_head
  934. destroy_rcu_head
  935. init_rcu_head_on_stack
  936. destroy_rcu_head_on_stack
  937. SLAB_TYPESAFE_BY_RCU
  938. RCU: Quiescents states and control::
  939. cond_resched_tasks_rcu_qs
  940. rcu_all_qs
  941. rcu_softirq_qs_periodic
  942. rcu_end_inkernel_boot
  943. rcu_expedite_gp
  944. rcu_gp_is_expedited
  945. rcu_unexpedite_gp
  946. rcu_cpu_stall_reset
  947. rcu_head_after_call_rcu
  948. rcu_is_watching
  949. RCU-sync primitive::
  950. rcu_sync_is_idle
  951. rcu_sync_init
  952. rcu_sync_enter
  953. rcu_sync_exit
  954. rcu_sync_dtor
  955. RCU-Tasks::
  956. Critical sections Grace period Barrier
  957. N/A call_rcu_tasks rcu_barrier_tasks
  958. synchronize_rcu_tasks
  959. RCU-Tasks-Rude::
  960. Critical sections Grace period Barrier
  961. N/A synchronize_rcu_tasks_rude rcu_barrier_tasks_rude
  962. call_rcu_tasks_rude
  963. RCU-Tasks-Trace::
  964. Critical sections Grace period Barrier
  965. rcu_read_lock_trace call_rcu_tasks_trace rcu_barrier_tasks_trace
  966. rcu_read_unlock_trace synchronize_rcu_tasks_trace
  967. guard(rcu_tasks_trace)()
  968. scoped_guard(rcu_tasks_trace)
  969. SRCU list traversal::
  970. list_for_each_entry_srcu
  971. hlist_for_each_entry_srcu
  972. SRCU::
  973. Critical sections Grace period Barrier
  974. srcu_read_lock call_srcu srcu_barrier
  975. srcu_read_unlock synchronize_srcu
  976. srcu_read_lock_fast synchronize_srcu_expedited
  977. srcu_read_unlock_fast get_state_synchronize_srcu
  978. srcu_read_lock_nmisafe start_poll_synchronize_srcu
  979. srcu_read_unlock_nmisafe start_poll_synchronize_srcu_expedited
  980. srcu_read_lock_notrace poll_state_synchronize_srcu
  981. srcu_read_unlock_notrace
  982. srcu_down_read
  983. srcu_up_read
  984. srcu_down_read_fast
  985. srcu_up_read_fast
  986. guard(srcu)()
  987. scoped_guard(srcu)
  988. srcu_read_lock_held
  989. srcu_dereference
  990. srcu_dereference_check
  991. srcu_dereference_notrace
  992. srcu_read_lock_held
  993. SRCU: Initialization/cleanup/ordering::
  994. DEFINE_SRCU
  995. DEFINE_STATIC_SRCU
  996. DEFINE_SRCU_FAST // for srcu_read_lock_fast() and friends
  997. DEFINE_STATIC_SRCU_FAST // for srcu_read_lock_fast() and friends
  998. init_srcu_struct
  999. init_srcu_struct_fast
  1000. cleanup_srcu_struct
  1001. smp_mb__after_srcu_read_unlock
  1002. All: lockdep-checked RCU utility APIs::
  1003. RCU_LOCKDEP_WARN
  1004. rcu_sleep_check
  1005. All: Unchecked RCU-protected pointer access::
  1006. rcu_dereference_raw
  1007. All: Unchecked RCU-protected pointer access with dereferencing prohibited::
  1008. rcu_access_pointer
  1009. See the comment headers in the source code (or the docbook generated
  1010. from them) for more information.
  1011. However, given that there are no fewer than four families of RCU APIs
  1012. in the Linux kernel, how do you choose which one to use? The following
  1013. list can be helpful:
  1014. a. Will readers need to block? If so, you need SRCU.
  1015. b. Will readers need to block and are you doing tracing, for
  1016. example, ftrace or BPF? If so, you need RCU-tasks,
  1017. RCU-tasks-rude, and/or RCU-tasks-trace.
  1018. c. What about the -rt patchset? If readers would need to block in
  1019. an non-rt kernel, you need SRCU. If readers would block when
  1020. acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
  1021. SRCU is not necessary. (The -rt patchset turns spinlocks into
  1022. sleeplocks, hence this distinction.)
  1023. d. Do you need to treat NMI handlers, hardirq handlers,
  1024. and code segments with preemption disabled (whether
  1025. via preempt_disable(), local_irq_save(), local_bh_disable(),
  1026. or some other mechanism) as if they were explicit RCU readers?
  1027. If so, RCU-sched readers are the only choice that will work
  1028. for you, but since about v4.20 you use can use the vanilla RCU
  1029. update primitives.
  1030. e. Do you need RCU grace periods to complete even in the face of
  1031. softirq monopolization of one or more of the CPUs? For example,
  1032. is your code subject to network-based denial-of-service attacks?
  1033. If so, you should disable softirq across your readers, for
  1034. example, by using rcu_read_lock_bh(). Since about v4.20 you
  1035. use can use the vanilla RCU update primitives.
  1036. f. Is your workload too update-intensive for normal use of
  1037. RCU, but inappropriate for other synchronization mechanisms?
  1038. If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
  1039. named SLAB_DESTROY_BY_RCU). But please be careful!
  1040. g. Do you need read-side critical sections that are respected even
  1041. on CPUs that are deep in the idle loop, during entry to or exit
  1042. from user-mode execution, or on an offlined CPU? If so, SRCU
  1043. and RCU Tasks Trace are the only choices that will work for you,
  1044. with SRCU being strongly preferred in almost all cases.
  1045. h. Otherwise, use RCU.
  1046. Of course, this all assumes that you have determined that RCU is in fact
  1047. the right tool for your job.
  1048. .. _9_whatisRCU:
  1049. 9. ANSWERS TO QUICK QUIZZES
  1050. ----------------------------
  1051. Quick Quiz #1:
  1052. Why is this argument naive? How could a deadlock
  1053. occur when using this algorithm in a real-world Linux
  1054. kernel? [Referring to the lock-based "toy" RCU
  1055. algorithm.]
  1056. Answer:
  1057. Consider the following sequence of events:
  1058. 1. CPU 0 acquires some unrelated lock, call it
  1059. "problematic_lock", disabling irq via
  1060. spin_lock_irqsave().
  1061. 2. CPU 1 enters synchronize_rcu(), write-acquiring
  1062. rcu_gp_mutex.
  1063. 3. CPU 0 enters rcu_read_lock(), but must wait
  1064. because CPU 1 holds rcu_gp_mutex.
  1065. 4. CPU 1 is interrupted, and the irq handler
  1066. attempts to acquire problematic_lock.
  1067. The system is now deadlocked.
  1068. One way to avoid this deadlock is to use an approach like
  1069. that of CONFIG_PREEMPT_RT, where all normal spinlocks
  1070. become blocking locks, and all irq handlers execute in
  1071. the context of special tasks. In this case, in step 4
  1072. above, the irq handler would block, allowing CPU 1 to
  1073. release rcu_gp_mutex, avoiding the deadlock.
  1074. Even in the absence of deadlock, this RCU implementation
  1075. allows latency to "bleed" from readers to other
  1076. readers through synchronize_rcu(). To see this,
  1077. consider task A in an RCU read-side critical section
  1078. (thus read-holding rcu_gp_mutex), task B blocked
  1079. attempting to write-acquire rcu_gp_mutex, and
  1080. task C blocked in rcu_read_lock() attempting to
  1081. read_acquire rcu_gp_mutex. Task A's RCU read-side
  1082. latency is holding up task C, albeit indirectly via
  1083. task B.
  1084. Realtime RCU implementations therefore use a counter-based
  1085. approach where tasks in RCU read-side critical sections
  1086. cannot be blocked by tasks executing synchronize_rcu().
  1087. :ref:`Back to Quick Quiz #1 <quiz_1>`
  1088. Quick Quiz #2:
  1089. Give an example where Classic RCU's read-side
  1090. overhead is **negative**.
  1091. Answer:
  1092. Imagine a single-CPU system with a non-CONFIG_PREEMPTION
  1093. kernel where a routing table is used by process-context
  1094. code, but can be updated by irq-context code (for example,
  1095. by an "ICMP REDIRECT" packet). The usual way of handling
  1096. this would be to have the process-context code disable
  1097. interrupts while searching the routing table. Use of
  1098. RCU allows such interrupt-disabling to be dispensed with.
  1099. Thus, without RCU, you pay the cost of disabling interrupts,
  1100. and with RCU you don't.
  1101. One can argue that the overhead of RCU in this
  1102. case is negative with respect to the single-CPU
  1103. interrupt-disabling approach. Others might argue that
  1104. the overhead of RCU is merely zero, and that replacing
  1105. the positive overhead of the interrupt-disabling scheme
  1106. with the zero-overhead RCU scheme does not constitute
  1107. negative overhead.
  1108. In real life, of course, things are more complex. But
  1109. even the theoretical possibility of negative overhead for
  1110. a synchronization primitive is a bit unexpected. ;-)
  1111. :ref:`Back to Quick Quiz #2 <quiz_2>`
  1112. Quick Quiz #3:
  1113. If it is illegal to block in an RCU read-side
  1114. critical section, what the heck do you do in
  1115. CONFIG_PREEMPT_RT, where normal spinlocks can block???
  1116. Answer:
  1117. Just as CONFIG_PREEMPT_RT permits preemption of spinlock
  1118. critical sections, it permits preemption of RCU
  1119. read-side critical sections. It also permits
  1120. spinlocks blocking while in RCU read-side critical
  1121. sections.
  1122. Why the apparent inconsistency? Because it is
  1123. possible to use priority boosting to keep the RCU
  1124. grace periods short if need be (for example, if running
  1125. short of memory). In contrast, if blocking waiting
  1126. for (say) network reception, there is no way to know
  1127. what should be boosted. Especially given that the
  1128. process we need to boost might well be a human being
  1129. who just went out for a pizza or something. And although
  1130. a computer-operated cattle prod might arouse serious
  1131. interest, it might also provoke serious objections.
  1132. Besides, how does the computer know what pizza parlor
  1133. the human being went to???
  1134. :ref:`Back to Quick Quiz #3 <quiz_3>`
  1135. ACKNOWLEDGEMENTS
  1136. My thanks to the people who helped make this human-readable, including
  1137. Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
  1138. For more information, see http://www.rdrop.com/users/paulmck/RCU.