housekeeping.rst 4.1 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111
  1. ======================================
  2. Housekeeping
  3. ======================================
  4. CPU Isolation moves away kernel work that may otherwise run on any CPU.
  5. The purpose of its related features is to reduce the OS jitter that some
  6. extreme workloads can't stand, such as in some DPDK usecases.
  7. The kernel work moved away by CPU isolation is commonly described as
  8. "housekeeping" because it includes ground work that performs cleanups,
  9. statistics maintainance and actions relying on them, memory release,
  10. various deferrals etc...
  11. Sometimes housekeeping is just some unbound work (unbound workqueues,
  12. unbound timers, ...) that gets easily assigned to non-isolated CPUs.
  13. But sometimes housekeeping is tied to a specific CPU and requires
  14. elaborated tricks to be offloaded to non-isolated CPUs (RCU_NOCB, remote
  15. scheduler tick, etc...).
  16. Thus, a housekeeping CPU can be considered as the reverse of an isolated
  17. CPU. It is simply a CPU that can execute housekeeping work. There must
  18. always be at least one online housekeeping CPU at any time. The CPUs that
  19. are not isolated are automatically assigned as housekeeping.
  20. Housekeeping is currently divided in four features described
  21. by the ``enum hk_type type``:
  22. 1. HK_TYPE_DOMAIN matches the work moved away by scheduler domain
  23. isolation performed through ``isolcpus=domain`` boot parameter or
  24. isolated cpuset partitions in cgroup v2. This includes scheduler
  25. load balancing, unbound workqueues and timers.
  26. 2. HK_TYPE_KERNEL_NOISE matches the work moved away by tick isolation
  27. performed through ``nohz_full=`` or ``isolcpus=nohz`` boot
  28. parameters. This includes remote scheduler tick, vmstat and lockup
  29. watchdog.
  30. 3. HK_TYPE_MANAGED_IRQ matches the IRQ handlers moved away by managed
  31. IRQ isolation performed through ``isolcpus=managed_irq``.
  32. 4. HK_TYPE_DOMAIN_BOOT matches the work moved away by scheduler domain
  33. isolation performed through ``isolcpus=domain`` only. It is similar
  34. to HK_TYPE_DOMAIN except it ignores the isolation performed by
  35. cpusets.
  36. Housekeeping cpumasks
  37. =================================
  38. Housekeeping cpumasks include the CPUs that can execute the work moved
  39. away by the matching isolation feature. These cpumasks are returned by
  40. the following function::
  41. const struct cpumask *housekeeping_cpumask(enum hk_type type)
  42. By default, if neither ``nohz_full=``, nor ``isolcpus``, nor cpuset's
  43. isolated partitions are used, which covers most usecases, this function
  44. returns the cpu_possible_mask.
  45. Otherwise the function returns the cpumask complement of the isolation
  46. feature. For example:
  47. With isolcpus=domain,7 the following will return a mask with all possible
  48. CPUs except 7::
  49. housekeeping_cpumask(HK_TYPE_DOMAIN)
  50. Similarly with nohz_full=5,6 the following will return a mask with all
  51. possible CPUs except 5,6::
  52. housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)
  53. Synchronization against cpusets
  54. =================================
  55. Cpuset can modify the HK_TYPE_DOMAIN housekeeping cpumask while creating,
  56. modifying or deleting an isolated partition.
  57. The users of HK_TYPE_DOMAIN cpumask must then make sure to synchronize
  58. properly against cpuset in order to make sure that:
  59. 1. The cpumask snapshot stays coherent.
  60. 2. No housekeeping work is queued on a newly made isolated CPU.
  61. 3. Pending housekeeping work that was queued to a non isolated
  62. CPU which just turned isolated through cpuset must be flushed
  63. before the related created/modified isolated partition is made
  64. available to userspace.
  65. This synchronization is maintained by an RCU based scheme. The cpuset update
  66. side waits for an RCU grace period after updating the HK_TYPE_DOMAIN
  67. cpumask and before flushing pending works. On the read side, care must be
  68. taken to gather the housekeeping target election and the work enqueue within
  69. the same RCU read side critical section.
  70. A typical layout example would look like this on the update side
  71. (``housekeeping_update()``)::
  72. rcu_assign_pointer(housekeeping_cpumasks[type], trial);
  73. synchronize_rcu();
  74. flush_workqueue(example_workqueue);
  75. And then on the read side::
  76. rcu_read_lock();
  77. cpu = housekeeping_any_cpu(HK_TYPE_DOMAIN);
  78. queue_work_on(cpu, example_workqueue, work);
  79. rcu_read_unlock();