architecture-porting.rst 5.4 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
  1. .. SPDX-License-Identifier: GPL-2.0
  2. =============================================
  3. Porting an architecture to support PREEMPT_RT
  4. =============================================
  5. :Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
  6. This list outlines the architecture specific requirements that must be
  7. implemented in order to enable PREEMPT_RT. Once all required features are
  8. implemented, ARCH_SUPPORTS_RT can be selected in architecture’s Kconfig to make
  9. PREEMPT_RT selectable.
  10. Many prerequisites (genirq support for example) are enforced by the common code
  11. and are omitted here.
  12. The optional features are not strictly required but it is worth to consider
  13. them.
  14. Requirements
  15. ------------
  16. Forced threaded interrupts
  17. CONFIG_IRQ_FORCED_THREADING must be selected. Any interrupts that must
  18. remain in hard-IRQ context must be marked with IRQF_NO_THREAD. This
  19. requirement applies for instance to clocksource event interrupts,
  20. perf interrupts and cascading interrupt-controller handlers.
  21. PREEMPTION support
  22. Kernel preemption must be supported and requires that
  23. CONFIG_ARCH_NO_PREEMPT remain unselected. Scheduling requests, such as those
  24. issued from an interrupt or other exception handler, must be processed
  25. immediately.
  26. POSIX CPU timers and KVM
  27. POSIX CPU timers must expire from thread context rather than directly within
  28. the timer interrupt. This behavior is enabled by setting the configuration
  29. option CONFIG_HAVE_POSIX_CPU_TIMERS_TASK_WORK.
  30. When virtualization support, such as KVM, is enabled,
  31. CONFIG_VIRT_XFER_TO_GUEST_WORK must also be set to ensure
  32. that any pending work, such as POSIX timer expiration, is handled before
  33. transitioning into guest mode.
  34. Hard-IRQ and Soft-IRQ stacks
  35. Soft interrupts are handled in the thread context in which they are raised. If
  36. a soft interrupt is triggered from hard-IRQ context, its execution is deferred
  37. to the ksoftirqd thread. Preemption is never disabled during soft interrupt
  38. handling, which makes soft interrupts preemptible.
  39. If an architecture provides a custom __do_softirq() implementation that uses a
  40. separate stack, it must select CONFIG_HAVE_SOFTIRQ_ON_OWN_STACK. The
  41. functionality should only be enabled when CONFIG_SOFTIRQ_ON_OWN_STACK is set.
  42. FPU and SIMD access in kernel mode
  43. FPU and SIMD registers are typically not used in kernel mode and are therefore
  44. not saved during kernel preemption. As a result, any kernel code that uses
  45. these registers must be enclosed within a kernel_fpu_begin() and
  46. kernel_fpu_end() section.
  47. The kernel_fpu_begin() function usually invokes local_bh_disable() to prevent
  48. interruptions from softirqs and to disable regular preemption. This allows the
  49. protected code to run safely in both thread and softirq contexts.
  50. On PREEMPT_RT kernels, however, kernel_fpu_begin() must not call
  51. local_bh_disable(). Instead, it should use preempt_disable(), since softirqs
  52. are always handled in thread context under PREEMPT_RT. In this case, disabling
  53. preemption alone is sufficient.
  54. The crypto subsystem operates on memory pages and requires users to "walk and
  55. map" these pages while processing a request. This operation must occur outside
  56. the kernel_fpu_begin()/ kernel_fpu_end() section because it requires preemption
  57. to be enabled. These preemption points are generally sufficient to avoid
  58. excessive scheduling latency.
  59. Exception handlers
  60. Exception handlers, such as the page fault handler, typically enable interrupts
  61. early, before invoking any generic code to process the exception. This is
  62. necessary because handling a page fault may involve operations that can sleep.
  63. Enabling interrupts is especially important on PREEMPT_RT, where certain
  64. locks, such as spinlock_t, become sleepable. For example, handling an
  65. invalid opcode may result in sending a SIGILL signal to the user task. A
  66. debug excpetion will send a SIGTRAP signal.
  67. In both cases, if the exception occurred in user space, it is safe to enable
  68. interrupts early. Sending a signal requires both interrupts and kernel
  69. preemption to be enabled.
  70. Optional features
  71. -----------------
  72. Timer and clocksource
  73. A high-resolution clocksource and clockevents device are recommended. The
  74. clockevents device should support the CLOCK_EVT_FEAT_ONESHOT feature for
  75. optimal timer behavior. In most cases, microsecond-level accuracy is
  76. sufficient
  77. Lazy preemption
  78. This mechanism allows an in-kernel scheduling request for non-real-time tasks
  79. to be delayed until the task is about to return to user space. It helps avoid
  80. preempting a task that holds a sleeping lock at the time of the scheduling
  81. request.
  82. With CONFIG_GENERIC_IRQ_ENTRY enabled, supporting this feature requires
  83. defining a bit for TIF_NEED_RESCHED_LAZY, preferably near TIF_NEED_RESCHED.
  84. Serial console with NBCON
  85. With PREEMPT_RT enabled, all console output is handled by a dedicated thread
  86. rather than directly from the context in which printk() is invoked. This design
  87. allows printk() to be safely used in atomic contexts.
  88. However, this also means that if the kernel crashes and cannot switch to the
  89. printing thread, no output will be visible preventing the system from printing
  90. its final messages.
  91. There are exceptions for immediate output, such as during panic() handling. To
  92. support this, the console driver must implement new-style lock handling. This
  93. involves setting the CON_NBCON flag in console::flags and providing
  94. implementations for the write_atomic, write_thread, device_lock, and
  95. device_unlock callbacks.