memory-hotplug.rst 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195
  1. .. _memory_hotplug:
  2. ==============
  3. Memory hotplug
  4. ==============
  5. Memory hotplug event notifier
  6. =============================
  7. Hotplugging events are sent to a notification queue.
  8. Memory notifier
  9. ----------------
  10. There are six types of notification defined in ``include/linux/memory.h``:
  11. MEM_GOING_ONLINE
  12. Generated before new memory becomes available in order to be able to
  13. prepare subsystems to handle memory. The page allocator is still unable
  14. to allocate from the new memory.
  15. MEM_CANCEL_ONLINE
  16. Generated if MEM_GOING_ONLINE fails.
  17. MEM_ONLINE
  18. Generated when memory has successfully brought online. The callback may
  19. allocate pages from the new memory.
  20. MEM_GOING_OFFLINE
  21. Generated to begin the process of offlining memory. Allocations are no
  22. longer possible from the memory but some of the memory to be offlined
  23. is still in use. The callback can be used to free memory known to a
  24. subsystem from the indicated memory block.
  25. MEM_CANCEL_OFFLINE
  26. Generated if MEM_GOING_OFFLINE fails. Memory is available again from
  27. the memory block that we attempted to offline.
  28. MEM_OFFLINE
  29. Generated after offlining memory is complete.
  30. A callback routine can be registered by calling::
  31. hotplug_memory_notifier(callback_func, priority)
  32. Callback functions with higher values of priority are called before callback
  33. functions with lower values.
  34. A callback function must have the following prototype::
  35. int callback_func(
  36. struct notifier_block *self, unsigned long action, void *arg);
  37. The first argument of the callback function (self) is a pointer to the block
  38. of the notifier chain that points to the callback function itself.
  39. The second argument (action) is one of the event types described above.
  40. The third argument (arg) passes a pointer of struct memory_notify::
  41. struct memory_notify {
  42. unsigned long start_pfn;
  43. unsigned long nr_pages;
  44. }
  45. - start_pfn is start_pfn of online/offline memory.
  46. - nr_pages is # of pages of online/offline memory.
  47. It is possible to get notified for MEM_CANCEL_ONLINE without having been notified
  48. for MEM_GOING_ONLINE, and the same applies to MEM_CANCEL_OFFLINE and
  49. MEM_GOING_OFFLINE.
  50. This can happen when a consumer fails, meaning we break the callchain and we
  51. stop calling the remaining consumers of the notifier.
  52. It is then important that users of memory_notify make no assumptions and get
  53. prepared to handle such cases.
  54. The callback routine shall return one of the values
  55. NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
  56. defined in ``include/linux/notifier.h``
  57. NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
  58. NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
  59. MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
  60. further processing of the notification queue.
  61. NOTIFY_STOP stops further processing of the notification queue.
  62. Numa node notifier
  63. ------------------
  64. There are six types of notification defined in ``include/linux/node.h``:
  65. NODE_ADDING_FIRST_MEMORY
  66. Generated before memory becomes available to this node for the first time.
  67. NODE_CANCEL_ADDING_FIRST_MEMORY
  68. Generated if NODE_ADDING_FIRST_MEMORY fails.
  69. NODE_ADDED_FIRST_MEMORY
  70. Generated when memory has become available fo this node for the first time.
  71. NODE_REMOVING_LAST_MEMORY
  72. Generated when the last memory available to this node is about to be offlined.
  73. NODE_CANCEL_REMOVING_LAST_MEMORY
  74. Generated when NODE_CANCEL_REMOVING_LAST_MEMORY fails.
  75. NODE_REMOVED_LAST_MEMORY
  76. Generated when the last memory available to this node has been offlined.
  77. A callback routine can be registered by calling::
  78. hotplug_node_notifier(callback_func, priority)
  79. Callback functions with higher values of priority are called before callback
  80. functions with lower values.
  81. A callback function must have the following prototype::
  82. int callback_func(
  83. struct notifier_block *self, unsigned long action, void *arg);
  84. The first argument of the callback function (self) is a pointer to the block
  85. of the notifier chain that points to the callback function itself.
  86. The second argument (action) is one of the event types described above.
  87. The third argument (arg) passes a pointer of struct node_notify::
  88. struct node_notify {
  89. int nid;
  90. }
  91. - nid is the node we are adding or removing memory to.
  92. It is possible to get notified for NODE_CANCEL_ADDING_FIRST_MEMORY without
  93. having been notified for NODE_ADDING_FIRST_MEMORY, and the same applies to
  94. NODE_CANCEL_REMOVING_LAST_MEMORY and NODE_REMOVING_LAST_MEMORY.
  95. This can happen when a consumer fails, meaning we break the callchain and we
  96. stop calling the remaining consumers of the notifier.
  97. It is then important that users of node_notify make no assumptions and get
  98. prepared to handle such cases.
  99. The callback routine shall return one of the values
  100. NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
  101. defined in ``include/linux/notifier.h``
  102. NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
  103. NOTIFY_BAD is used as response to the NODE_ADDING_FIRST_MEMORY,
  104. NODE_REMOVING_LAST_MEMORY, NODE_ADDED_FIRST_MEMORY or
  105. NODE_REMOVED_LAST_MEMORY action to cancel hotplugging.
  106. It stops further processing of the notification queue.
  107. NOTIFY_STOP stops further processing of the notification queue.
  108. Please note that we should not fail for NODE_ADDED_FIRST_MEMORY /
  109. NODE_REMOVED_FIRST_MEMORY, as memory_hotplug code cannot rollback at that
  110. point anymore.
  111. Locking Internals
  112. =================
  113. When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
  114. the device_hotplug_lock should be held to:
  115. - synchronize against online/offline requests (e.g. via sysfs). This way, memory
  116. block devices can only be accessed (.online/.state attributes) by user
  117. space once memory has been fully added. And when removing memory, we
  118. know nobody is in critical sections.
  119. - synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
  120. Especially, there is a possible lock inversion that is avoided using
  121. device_hotplug_lock when adding memory and user space tries to online that
  122. memory faster than expected:
  123. - device_online() will first take the device_lock(), followed by
  124. mem_hotplug_lock
  125. - add_memory_resource() will first take the mem_hotplug_lock, followed by
  126. the device_lock() (while creating the devices, during bus_add_device()).
  127. As the device is visible to user space before taking the device_lock(), this
  128. can result in a lock inversion.
  129. onlining/offlining of memory should be done via device_online()/
  130. device_offline() - to make sure it is properly synchronized to actions
  131. via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
  132. When adding/removing/onlining/offlining memory or adding/removing
  133. heterogeneous/device memory, we should always hold the mem_hotplug_lock in
  134. write mode to serialise memory hotplug (e.g. access to global/zone
  135. variables).
  136. In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
  137. mode allows for a quite efficient get_online_mems/put_online_mems
  138. implementation, so code accessing memory can protect from that memory
  139. vanishing.