| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111 |
- ======================================
- Housekeeping
- ======================================
- CPU Isolation moves away kernel work that may otherwise run on any CPU.
- The purpose of its related features is to reduce the OS jitter that some
- extreme workloads can't stand, such as in some DPDK usecases.
- The kernel work moved away by CPU isolation is commonly described as
- "housekeeping" because it includes ground work that performs cleanups,
- statistics maintainance and actions relying on them, memory release,
- various deferrals etc...
- Sometimes housekeeping is just some unbound work (unbound workqueues,
- unbound timers, ...) that gets easily assigned to non-isolated CPUs.
- But sometimes housekeeping is tied to a specific CPU and requires
- elaborated tricks to be offloaded to non-isolated CPUs (RCU_NOCB, remote
- scheduler tick, etc...).
- Thus, a housekeeping CPU can be considered as the reverse of an isolated
- CPU. It is simply a CPU that can execute housekeeping work. There must
- always be at least one online housekeeping CPU at any time. The CPUs that
- are not isolated are automatically assigned as housekeeping.
- Housekeeping is currently divided in four features described
- by the ``enum hk_type type``:
- 1. HK_TYPE_DOMAIN matches the work moved away by scheduler domain
- isolation performed through ``isolcpus=domain`` boot parameter or
- isolated cpuset partitions in cgroup v2. This includes scheduler
- load balancing, unbound workqueues and timers.
- 2. HK_TYPE_KERNEL_NOISE matches the work moved away by tick isolation
- performed through ``nohz_full=`` or ``isolcpus=nohz`` boot
- parameters. This includes remote scheduler tick, vmstat and lockup
- watchdog.
- 3. HK_TYPE_MANAGED_IRQ matches the IRQ handlers moved away by managed
- IRQ isolation performed through ``isolcpus=managed_irq``.
- 4. HK_TYPE_DOMAIN_BOOT matches the work moved away by scheduler domain
- isolation performed through ``isolcpus=domain`` only. It is similar
- to HK_TYPE_DOMAIN except it ignores the isolation performed by
- cpusets.
- Housekeeping cpumasks
- =================================
- Housekeeping cpumasks include the CPUs that can execute the work moved
- away by the matching isolation feature. These cpumasks are returned by
- the following function::
- const struct cpumask *housekeeping_cpumask(enum hk_type type)
- By default, if neither ``nohz_full=``, nor ``isolcpus``, nor cpuset's
- isolated partitions are used, which covers most usecases, this function
- returns the cpu_possible_mask.
- Otherwise the function returns the cpumask complement of the isolation
- feature. For example:
- With isolcpus=domain,7 the following will return a mask with all possible
- CPUs except 7::
- housekeeping_cpumask(HK_TYPE_DOMAIN)
- Similarly with nohz_full=5,6 the following will return a mask with all
- possible CPUs except 5,6::
- housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)
- Synchronization against cpusets
- =================================
- Cpuset can modify the HK_TYPE_DOMAIN housekeeping cpumask while creating,
- modifying or deleting an isolated partition.
- The users of HK_TYPE_DOMAIN cpumask must then make sure to synchronize
- properly against cpuset in order to make sure that:
- 1. The cpumask snapshot stays coherent.
- 2. No housekeeping work is queued on a newly made isolated CPU.
- 3. Pending housekeeping work that was queued to a non isolated
- CPU which just turned isolated through cpuset must be flushed
- before the related created/modified isolated partition is made
- available to userspace.
- This synchronization is maintained by an RCU based scheme. The cpuset update
- side waits for an RCU grace period after updating the HK_TYPE_DOMAIN
- cpumask and before flushing pending works. On the read side, care must be
- taken to gather the housekeeping target election and the work enqueue within
- the same RCU read side critical section.
- A typical layout example would look like this on the update side
- (``housekeeping_update()``)::
- rcu_assign_pointer(housekeeping_cpumasks[type], trial);
- synchronize_rcu();
- flush_workqueue(example_workqueue);
- And then on the read side::
- rcu_read_lock();
- cpu = housekeeping_any_cpu(HK_TYPE_DOMAIN);
- queue_work_on(cpu, example_workqueue, work);
- rcu_read_unlock();
|