| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303 |
- [!] Note:
- This file expands on the "Locking" section of recipes.txt,
- focusing on locklessly accessing shared variables that are
- otherwise protected by a lock.
- Locking
- =======
- Locking is well-known and the common use cases are straightforward: Any
- CPU holding a given lock sees any changes previously seen or made by any
- CPU before it previously released that same lock. This last sentence
- is the only part of this document that most developers will need to read.
- However, developers who would like to also access lock-protected shared
- variables outside of their corresponding locks should continue reading.
- Locking and Prior Accesses
- --------------------------
- The basic rule of locking is worth repeating:
- Any CPU holding a given lock sees any changes previously seen
- or made by any CPU before it previously released that same lock.
- Note that this statement is a bit stronger than "Any CPU holding a
- given lock sees all changes made by any CPU during the time that CPU was
- previously holding this same lock". For example, consider the following
- pair of code fragments:
- /* See MP+polocks.litmus. */
- void CPU0(void)
- {
- WRITE_ONCE(x, 1);
- spin_lock(&mylock);
- WRITE_ONCE(y, 1);
- spin_unlock(&mylock);
- }
- void CPU1(void)
- {
- spin_lock(&mylock);
- r0 = READ_ONCE(y);
- spin_unlock(&mylock);
- r1 = READ_ONCE(x);
- }
- The basic rule guarantees that if CPU0() acquires mylock before CPU1(),
- then both r0 and r1 must be set to the value 1. This also has the
- consequence that if the final value of r0 is equal to 1, then the final
- value of r1 must also be equal to 1. In contrast, the weaker rule would
- say nothing about the final value of r1.
- Locking and Subsequent Accesses
- -------------------------------
- The converse to the basic rule also holds: Any CPU holding a given
- lock will not see any changes that will be made by any CPU after it
- subsequently acquires this same lock. This converse statement is
- illustrated by the following litmus test:
- /* See MP+porevlocks.litmus. */
- void CPU0(void)
- {
- r0 = READ_ONCE(y);
- spin_lock(&mylock);
- r1 = READ_ONCE(x);
- spin_unlock(&mylock);
- }
- void CPU1(void)
- {
- spin_lock(&mylock);
- WRITE_ONCE(x, 1);
- spin_unlock(&mylock);
- WRITE_ONCE(y, 1);
- }
- This converse to the basic rule guarantees that if CPU0() acquires
- mylock before CPU1(), then both r0 and r1 must be set to the value 0.
- This also has the consequence that if the final value of r1 is equal
- to 0, then the final value of r0 must also be equal to 0. In contrast,
- the weaker rule would say nothing about the final value of r0.
- These examples show only a single pair of CPUs, but the effects of the
- locking basic rule extend across multiple acquisitions of a given lock
- across multiple CPUs.
- Double-Checked Locking
- ----------------------
- It is well known that more than just a lock is required to make
- double-checked locking work correctly, This litmus test illustrates
- one incorrect approach:
- /* See Documentation/litmus-tests/locking/DCL-broken.litmus. */
- void CPU0(void)
- {
- r0 = READ_ONCE(flag);
- if (r0 == 0) {
- spin_lock(&lck);
- r1 = READ_ONCE(flag);
- if (r1 == 0) {
- WRITE_ONCE(data, 1);
- WRITE_ONCE(flag, 1);
- }
- spin_unlock(&lck);
- }
- r2 = READ_ONCE(data);
- }
- /* CPU1() is the exactly the same as CPU0(). */
- There are two problems. First, there is no ordering between the first
- READ_ONCE() of "flag" and the READ_ONCE() of "data". Second, there is
- no ordering between the two WRITE_ONCE() calls. It should therefore be
- no surprise that "r2" can be zero, and a quick herd7 run confirms this.
- One way to fix this is to use smp_load_acquire() and smp_store_release()
- as shown in this corrected version:
- /* See Documentation/litmus-tests/locking/DCL-fixed.litmus. */
- void CPU0(void)
- {
- r0 = smp_load_acquire(&flag);
- if (r0 == 0) {
- spin_lock(&lck);
- r1 = READ_ONCE(flag);
- if (r1 == 0) {
- WRITE_ONCE(data, 1);
- smp_store_release(&flag, 1);
- }
- spin_unlock(&lck);
- }
- r2 = READ_ONCE(data);
- }
- /* CPU1() is the exactly the same as CPU0(). */
- The smp_load_acquire() guarantees that its load from "flags" will
- be ordered before the READ_ONCE() from data, thus solving the first
- problem. The smp_store_release() guarantees that its store will be
- ordered after the WRITE_ONCE() to "data", solving the second problem.
- The smp_store_release() pairs with the smp_load_acquire(), thus ensuring
- that the ordering provided by each actually takes effect. Again, a
- quick herd7 run confirms this.
- In short, if you access a lock-protected variable without holding the
- corresponding lock, you will need to provide additional ordering, in
- this case, via the smp_load_acquire() and the smp_store_release().
- Ordering Provided by a Lock to CPUs Not Holding That Lock
- ---------------------------------------------------------
- It is not necessarily the case that accesses ordered by locking will be
- seen as ordered by CPUs not holding that lock. Consider this example:
- /* See Z6.0+pooncelock+pooncelock+pombonce.litmus. */
- void CPU0(void)
- {
- spin_lock(&mylock);
- WRITE_ONCE(x, 1);
- WRITE_ONCE(y, 1);
- spin_unlock(&mylock);
- }
- void CPU1(void)
- {
- spin_lock(&mylock);
- r0 = READ_ONCE(y);
- WRITE_ONCE(z, 1);
- spin_unlock(&mylock);
- }
- void CPU2(void)
- {
- WRITE_ONCE(z, 2);
- smp_mb();
- r1 = READ_ONCE(x);
- }
- Counter-intuitive though it might be, it is quite possible to have
- the final value of r0 be 1, the final value of z be 2, and the final
- value of r1 be 0. The reason for this surprising outcome is that CPU2()
- never acquired the lock, and thus did not fully benefit from the lock's
- ordering properties.
- Ordering can be extended to CPUs not holding the lock by careful use
- of smp_mb__after_spinlock():
- /* See Z6.0+pooncelock+poonceLock+pombonce.litmus. */
- void CPU0(void)
- {
- spin_lock(&mylock);
- WRITE_ONCE(x, 1);
- WRITE_ONCE(y, 1);
- spin_unlock(&mylock);
- }
- void CPU1(void)
- {
- spin_lock(&mylock);
- smp_mb__after_spinlock();
- r0 = READ_ONCE(y);
- WRITE_ONCE(z, 1);
- spin_unlock(&mylock);
- }
- void CPU2(void)
- {
- WRITE_ONCE(z, 2);
- smp_mb();
- r1 = READ_ONCE(x);
- }
- This addition of smp_mb__after_spinlock() strengthens the lock
- acquisition sufficiently to rule out the counter-intuitive outcome.
- In other words, the addition of the smp_mb__after_spinlock() prohibits
- the counter-intuitive result where the final value of r0 is 1, the final
- value of z is 2, and the final value of r1 is 0.
- No Roach-Motel Locking!
- -----------------------
- This example requires familiarity with the herd7 "filter" clause, so
- please read up on that topic in litmus-tests.txt.
- It is tempting to allow memory-reference instructions to be pulled
- into a critical section, but this cannot be allowed in the general case.
- For example, consider a spin loop preceding a lock-based critical section.
- Now, herd7 does not model spin loops, but we can emulate one with two
- loads, with a "filter" clause to constrain the first to return the
- initial value and the second to return the updated value, as shown below:
- /* See Documentation/litmus-tests/locking/RM-fixed.litmus. */
- void CPU0(void)
- {
- spin_lock(&lck);
- r2 = atomic_inc_return(&y);
- WRITE_ONCE(x, 1);
- spin_unlock(&lck);
- }
- void CPU1(void)
- {
- r0 = READ_ONCE(x);
- r1 = READ_ONCE(x);
- spin_lock(&lck);
- r2 = atomic_inc_return(&y);
- spin_unlock(&lck);
- }
- filter (1:r0=0 /\ 1:r1=1)
- exists (1:r2=1)
- The variable "x" is the control variable for the emulated spin loop.
- CPU0() sets it to "1" while holding the lock, and CPU1() emulates the
- spin loop by reading it twice, first into "1:r0" (which should get the
- initial value "0") and then into "1:r1" (which should get the updated
- value "1").
- The "filter" clause takes this into account, constraining "1:r0" to
- equal "0" and "1:r1" to equal 1.
- Then the "exists" clause checks to see if CPU1() acquired its lock first,
- which should not happen given the filter clause because CPU0() updates
- "x" while holding the lock. And herd7 confirms this.
- But suppose that the compiler was permitted to reorder the spin loop
- into CPU1()'s critical section, like this:
- /* See Documentation/litmus-tests/locking/RM-broken.litmus. */
- void CPU0(void)
- {
- int r2;
- spin_lock(&lck);
- r2 = atomic_inc_return(&y);
- WRITE_ONCE(x, 1);
- spin_unlock(&lck);
- }
- void CPU1(void)
- {
- spin_lock(&lck);
- r0 = READ_ONCE(x);
- r1 = READ_ONCE(x);
- r2 = atomic_inc_return(&y);
- spin_unlock(&lck);
- }
- filter (1:r0=0 /\ 1:r1=1)
- exists (1:r2=1)
- If "1:r0" is equal to "0", "1:r1" can never equal "1" because CPU0()
- cannot update "x" while CPU1() holds the lock. And herd7 confirms this,
- showing zero executions matching the "filter" criteria.
- And this is why Linux-kernel lock and unlock primitives must prevent
- code from entering critical sections. It is not sufficient to only
- prevent code from leaving them.
|