theory.rst 5.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =====================
  3. Theory of operation
  4. =====================
  5. :Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
  6. Preface
  7. =======
  8. PREEMPT_RT transforms the Linux kernel into a real-time kernel. It achieves
  9. this by replacing locking primitives, such as spinlock_t, with a preemptible
  10. and priority-inheritance aware implementation known as rtmutex, and by enforcing
  11. the use of threaded interrupts. As a result, the kernel becomes fully
  12. preemptible, with the exception of a few critical code paths, including entry
  13. code, the scheduler, and low-level interrupt handling routines.
  14. This transformation places the majority of kernel execution contexts under the
  15. control of the scheduler and significantly increasing the number of preemption
  16. points. Consequently, it reduces the latency between a high-priority task
  17. becoming runnable and its actual execution on the CPU.
  18. Scheduling
  19. ==========
  20. The core principles of Linux scheduling and the associated user-space API are
  21. documented in the man page sched(7)
  22. `sched(7) <https://man7.org/linux/man-pages/man7/sched.7.html>`_.
  23. By default, the Linux kernel uses the SCHED_OTHER scheduling policy. Under
  24. this policy, a task is preempted when the scheduler determines that it has
  25. consumed a fair share of CPU time relative to other runnable tasks. However,
  26. the policy does not guarantee immediate preemption when a new SCHED_OTHER task
  27. becomes runnable. The currently running task may continue executing.
  28. This behavior differs from that of real-time scheduling policies such as
  29. SCHED_FIFO. When a task with a real-time policy becomes runnable, the
  30. scheduler immediately selects it for execution if it has a higher priority than
  31. the currently running task. The task continues to run until it voluntarily
  32. yields the CPU, typically by blocking on an event.
  33. Sleeping spin locks
  34. ===================
  35. The various lock types and their behavior under real-time configurations are
  36. described in detail in Documentation/locking/locktypes.rst.
  37. In a non-PREEMPT_RT configuration, a spinlock_t is acquired by first disabling
  38. preemption and then actively spinning until the lock becomes available. Once
  39. the lock is released, preemption is enabled. From a real-time perspective,
  40. this approach is undesirable because disabling preemption prevents the
  41. scheduler from switching to a higher-priority task, potentially increasing
  42. latency.
  43. To address this, PREEMPT_RT replaces spinning locks with sleeping spin locks
  44. that do not disable preemption. On PREEMPT_RT, spinlock_t is implemented using
  45. rtmutex. Instead of spinning, a task attempting to acquire a contended lock
  46. disables CPU migration, donates its priority to the lock owner (priority
  47. inheritance), and voluntarily schedules out while waiting for the lock to
  48. become available.
  49. Disabling CPU migration provides the same effect as disabling preemption, while
  50. still allowing preemption and ensuring that the task continues to run on the
  51. same CPU while holding a sleeping lock.
  52. Priority inheritance
  53. ====================
  54. Lock types such as spinlock_t and mutex_t in a PREEMPT_RT enabled kernel are
  55. implemented on top of rtmutex, which provides support for priority inheritance
  56. (PI). When a task blocks on such a lock, the PI mechanism temporarily
  57. propagates the blocked task’s scheduling parameters to the lock owner.
  58. For example, if a SCHED_FIFO task A blocks on a lock currently held by a
  59. SCHED_OTHER task B, task A’s scheduling policy and priority are temporarily
  60. inherited by task B. After this inheritance, task A is put to sleep while
  61. waiting for the lock, and task B effectively becomes the highest-priority task
  62. in the system. This allows B to continue executing, make progress, and
  63. eventually release the lock.
  64. Once B releases the lock, it reverts to its original scheduling parameters, and
  65. task A can resume execution.
  66. Threaded interrupts
  67. ===================
  68. Interrupt handlers are another source of code that executes with preemption
  69. disabled and outside the control of the scheduler. To bring interrupt handling
  70. under scheduler control, PREEMPT_RT enforces threaded interrupt handlers.
  71. With forced threading, interrupt handling is split into two stages. The first
  72. stage, the primary handler, is executed in IRQ context with interrupts disabled.
  73. Its sole responsibility is to wake the associated threaded handler. The second
  74. stage, the threaded handler, is the function passed to request_irq() as the
  75. interrupt handler. It runs in process context, scheduled by the kernel.
  76. From waking the interrupt thread until threaded handling is completed, the
  77. interrupt source is masked in the interrupt controller. This ensures that the
  78. device interrupt remains pending but does not retrigger the CPU, allowing the
  79. system to exit IRQ context and handle the interrupt in a scheduled thread.
  80. By default, the threaded handler executes with the SCHED_FIFO scheduling policy
  81. and a priority of 50 (MAX_RT_PRIO / 2), which is midway between the minimum and
  82. maximum real-time priorities.
  83. If the threaded interrupt handler raises any soft interrupts during its
  84. execution, those soft interrupt routines are invoked after the threaded handler
  85. completes, within the same thread. Preemption remains enabled during the
  86. execution of the soft interrupt handler.
  87. Summary
  88. =======
  89. By using sleeping locks and forced-threaded interrupts, PREEMPT_RT
  90. significantly reduces sections of code where interrupts or preemption is
  91. disabled, allowing the scheduler to preempt the current execution context and
  92. switch to a higher-priority task.