| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140 |
- =====================
- Restartable Sequences
- =====================
- Restartable Sequences allow to register a per thread userspace memory area
- to be used as an ABI between kernel and userspace for three purposes:
- * userspace restartable sequences
- * quick access to read the current CPU number, node ID from userspace
- * scheduler time slice extensions
- Restartable sequences (per-cpu atomics)
- ---------------------------------------
- Restartable sequences allow userspace to perform update operations on
- per-cpu data without requiring heavyweight atomic operations. The actual
- ABI is unfortunately only available in the code and selftests.
- Quick access to CPU number, node ID
- -----------------------------------
- Allows to implement per CPU data efficiently. Documentation is in code and
- selftests. :(
- Scheduler time slice extensions
- -------------------------------
- This allows a thread to request a time slice extension when it enters a
- critical section to avoid contention on a resource when the thread is
- scheduled out inside of the critical section.
- The prerequisites for this functionality are:
- * Enabled in Kconfig
- * Enabled at boot time (default is enabled)
- * A rseq userspace pointer has been registered for the thread
- The thread has to enable the functionality via prctl(2)::
- prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET,
- PR_RSEQ_SLICE_EXT_ENABLE, 0, 0);
- prctl() returns 0 on success or otherwise with the following error codes:
- ========= ==============================================================
- Errorcode Meaning
- ========= ==============================================================
- EINVAL Functionality not available or invalid function arguments.
- Note: arg4 and arg5 must be zero
- ENOTSUPP Functionality was disabled on the kernel command line
- ENXIO Available, but no rseq user struct registered
- ========= ==============================================================
- The state can be also queried via prctl(2)::
- prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_GET, 0, 0, 0);
- prctl() returns ``PR_RSEQ_SLICE_EXT_ENABLE`` when it is enabled or 0 if
- disabled. Otherwise it returns with the following error codes:
- ========= ==============================================================
- Errorcode Meaning
- ========= ==============================================================
- EINVAL Functionality not available or invalid function arguments.
- Note: arg3 and arg4 and arg5 must be zero
- ========= ==============================================================
- The availability and status is also exposed via the rseq ABI struct flags
- field via the ``RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE_BIT`` and the
- ``RSEQ_CS_FLAG_SLICE_EXT_ENABLED_BIT``. These bits are read-only for user
- space and only for informational purposes.
- If the mechanism was enabled via prctl(), the thread can request a time
- slice extension by setting rseq::slice_ctrl::request to 1. If the thread is
- interrupted and the interrupt results in a reschedule request in the
- kernel, then the kernel can grant a time slice extension and return to
- userspace instead of scheduling out. The length of the extension is
- determined by debugfs:rseq/slice_ext_nsec. The default value is 5 usec; which
- is the minimum value. It can be incremented to 50 usecs, however doing so
- can/will affect the minimum scheduling latency.
- Any proposed changes to this default will have to come with a selftest and
- rseq-slice-hist.py output that shows the new value has merrit.
- The kernel indicates the grant by clearing rseq::slice_ctrl::request and
- setting rseq::slice_ctrl::granted to 1. If there is a reschedule of the
- thread after granting the extension, the kernel clears the granted bit to
- indicate that to userspace.
- If the request bit is still set when the leaving the critical section,
- userspace can clear it and continue.
- If the granted bit is set, then userspace invokes rseq_slice_yield(2) when
- leaving the critical section to relinquish the CPU. The kernel enforces
- this by arming a timer to prevent misbehaving userspace from abusing this
- mechanism.
- If both the request bit and the granted bit are false when leaving the
- critical section, then this indicates that a grant was revoked and no
- further action is required by userspace.
- The required code flow is as follows::
- rseq->slice_ctrl.request = 1;
- barrier(); // Prevent compiler reordering
- critical_section();
- barrier(); // Prevent compiler reordering
- rseq->slice_ctrl.request = 0;
- if (rseq->slice_ctrl.granted)
- rseq_slice_yield();
- As all of this is strictly CPU local, there are no atomicity requirements.
- Checking the granted state is racy, but that cannot be avoided at all::
- if (rseq->slice_ctrl.granted)
- -> Interrupt results in schedule and grant revocation
- rseq_slice_yield();
- So there is no point in pretending that this might be solved by an atomic
- operation.
- If the thread issues a syscall other than rseq_slice_yield(2) within the
- granted timeslice extension, the grant is also revoked and the CPU is
- relinquished immediately when entering the kernel. This is required as
- syscalls might consume arbitrary CPU time until they reach a scheduling
- point when the preemption model is either NONE or VOLUNTARY and therefore
- might exceed the grant by far.
- The preferred solution for user space is to use rseq_slice_yield(2) which
- is side effect free. The support for arbitrary syscalls is required to
- support onion layer architectured applications, where the code handling the
- critical section and requesting the time slice extension has no control
- over the code within the critical section.
- The kernel enforces flag consistency and terminates the thread with SIGSEGV
- if it detects a violation.
|