hibernation.rst 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336
  1. .. SPDX-License-Identifier: GPL-2.0
  2. Hibernating Guest VMs
  3. =====================
  4. Background
  5. ----------
  6. Linux supports the ability to hibernate itself in order to save power.
  7. Hibernation is sometimes called suspend-to-disk, as it writes a memory
  8. image to disk and puts the hardware into the lowest possible power
  9. state. Upon resume from hibernation, the hardware is restarted and the
  10. memory image is restored from disk so that it can resume execution
  11. where it left off. See the "Hibernation" section of
  12. Documentation/admin-guide/pm/sleep-states.rst.
  13. Hibernation is usually done on devices with a single user, such as a
  14. personal laptop. For example, the laptop goes into hibernation when
  15. the cover is closed, and resumes when the cover is opened again.
  16. Hibernation and resume happen on the same hardware, and Linux kernel
  17. code orchestrating the hibernation steps assumes that the hardware
  18. configuration is not changed while in the hibernated state.
  19. Hibernation can be initiated within Linux by writing "disk" to
  20. /sys/power/state or by invoking the reboot system call with the
  21. appropriate arguments. This functionality may be wrapped by user space
  22. commands such "systemctl hibernate" that are run directly from a
  23. command line or in response to events such as the laptop lid closing.
  24. Considerations for Guest VM Hibernation
  25. ---------------------------------------
  26. Linux guests on Hyper-V can also be hibernated, in which case the
  27. hardware is the virtual hardware provided by Hyper-V to the guest VM.
  28. Only the targeted guest VM is hibernated, while other guest VMs and
  29. the underlying Hyper-V host continue to run normally. While the
  30. underlying Windows Hyper-V and physical hardware on which it is
  31. running might also be hibernated using hibernation functionality in
  32. the Windows host, host hibernation and its impact on guest VMs is not
  33. in scope for this documentation.
  34. Resuming a hibernated guest VM can be more challenging than with
  35. physical hardware because VMs make it very easy to change the hardware
  36. configuration between the hibernation and resume. Even when the resume
  37. is done on the same VM that hibernated, the memory size might be
  38. changed, or virtual NICs or SCSI controllers might be added or
  39. removed. Virtual PCI devices assigned to the VM might be added or
  40. removed. Most such changes cause the resume steps to fail, though
  41. adding a new virtual NIC, SCSI controller, or vPCI device should work.
  42. Additional complexity can ensue because the disks of the hibernated VM
  43. can be moved to another newly created VM that otherwise has the same
  44. virtual hardware configuration. While it is desirable for resume from
  45. hibernation to succeed after such a move, there are challenges. See
  46. details on this scenario and its limitations in the "Resuming on a
  47. Different VM" section below.
  48. Hyper-V also provides ways to move a VM from one Hyper-V host to
  49. another. Hyper-V tries to ensure processor model and Hyper-V version
  50. compatibility using VM Configuration Versions, and prevents moves to
  51. a host that isn't compatible. Linux adapts to host and processor
  52. differences by detecting them at boot time, but such detection is not
  53. done when resuming execution in the hibernation image. If a VM is
  54. hibernated on one host, then resumed on a host with a different processor
  55. model or Hyper-V version, settings recorded in the hibernation image
  56. may not match the new host. Because Linux does not detect such
  57. mismatches when resuming the hibernation image, undefined behavior
  58. and failures could result.
  59. Enabling Guest VM Hibernation
  60. -----------------------------
  61. Hibernation of a Hyper-V guest VM is disabled by default because
  62. hibernation is incompatible with memory hot-add, as provided by the
  63. Hyper-V balloon driver. If hot-add is used and the VM hibernates, it
  64. hibernates with more memory than it started with. But when the VM
  65. resumes from hibernation, Hyper-V gives the VM only the originally
  66. assigned memory, and the memory size mismatch causes resume to fail.
  67. To enable a Hyper-V VM for hibernation, the Hyper-V administrator must
  68. enable the ACPI virtual S4 sleep state in the ACPI configuration that
  69. Hyper-V provides to the guest VM. Such enablement is accomplished by
  70. modifying a WMI property of the VM, the steps for which are outside
  71. the scope of this documentation but are available on the web.
  72. Enablement is treated as the indicator that the administrator
  73. prioritizes Linux hibernation in the VM over hot-add, so the Hyper-V
  74. balloon driver in Linux disables hot-add. Enablement is indicated if
  75. the contents of /sys/power/disk contains "platform" as an option. The
  76. enablement is also visible in /sys/bus/vmbus/hibernation. See function
  77. hv_is_hibernation_supported().
  78. Linux supports ACPI sleep states on x86, but not on arm64. So Linux
  79. guest VM hibernation is not available on Hyper-V for arm64.
  80. Initiating Guest VM Hibernation
  81. -------------------------------
  82. Guest VMs can self-initiate hibernation using the standard Linux
  83. methods of writing "disk" to /sys/power/state or the reboot system
  84. call. As an additional layer, Linux guests on Hyper-V support the
  85. "Shutdown" integration service, via which a Hyper-V administrator can
  86. tell a Linux VM to hibernate using a command outside the VM. The
  87. command generates a request to the Hyper-V shutdown driver in Linux,
  88. which sends the uevent "EVENT=hibernate". See kernel functions
  89. shutdown_onchannelcallback() and send_hibernate_uevent(). A udev rule
  90. must be provided in the VM that handles this event and initiates
  91. hibernation.
  92. Handling VMBus Devices During Hibernation & Resume
  93. --------------------------------------------------
  94. The VMBus bus driver, and the individual VMBus device drivers,
  95. implement suspend and resume functions that are called as part of the
  96. Linux orchestration of hibernation and of resuming from hibernation.
  97. The overall approach is to leave in place the data structures for the
  98. primary VMBus channels and their associated Linux devices, such as
  99. SCSI controllers and others, so that they are captured in the
  100. hibernation image. This approach allows any state associated with the
  101. device to be persisted across the hibernation/resume. When the VM
  102. resumes, the devices are re-offered by Hyper-V and are connected to
  103. the data structures that already exist in the resumed hibernation
  104. image.
  105. VMBus devices are identified by class and instance GUID. (See section
  106. "VMBus device creation/deletion" in
  107. Documentation/virt/hyperv/vmbus.rst.) Upon resume from hibernation,
  108. the resume functions expect that the devices offered by Hyper-V have
  109. the same class/instance GUIDs as the devices present at the time of
  110. hibernation. Having the same class/instance GUIDs allows the offered
  111. devices to be matched to the primary VMBus channel data structures in
  112. the memory of the now resumed hibernation image. If any devices are
  113. offered that don't match primary VMBus channel data structures that
  114. already exist, they are processed normally as newly added devices. If
  115. primary VMBus channels that exist in the resumed hibernation image are
  116. not matched with a device offered in the resumed VM, the resume
  117. sequence waits for 10 seconds, then proceeds. But the unmatched device
  118. is likely to cause errors in the resumed VM.
  119. When resuming existing primary VMBus channels, the newly offered
  120. relids might be different because relids can change on each VM boot,
  121. even if the VM configuration hasn't changed. The VMBus bus driver
  122. resume function matches the class/instance GUIDs, and updates the
  123. relids in case they have changed.
  124. VMBus sub-channels are not persisted in the hibernation image. Each
  125. VMBus device driver's suspend function must close any sub-channels
  126. prior to hibernation. Closing a sub-channel causes Hyper-V to send a
  127. RESCIND_CHANNELOFFER message, which Linux processes by freeing the
  128. channel data structures so that all vestiges of the sub-channel are
  129. removed. By contrast, primary channels are marked closed and their
  130. ring buffers are freed, but Hyper-V does not send a rescind message,
  131. so the channel data structure continues to exist. Upon resume, the
  132. device driver's resume function re-allocates the ring buffer and
  133. re-opens the existing channel. It then communicates with Hyper-V to
  134. re-open sub-channels from scratch.
  135. The Linux ends of Hyper-V sockets are forced closed at the time of
  136. hibernation. The guest can't force closing the host end of the socket,
  137. but any host-side actions on the host end will produce an error.
  138. VMBus devices use the same suspend function for the "freeze" and the
  139. "poweroff" phases, and the same resume function for the "thaw" and
  140. "restore" phases. See the "Entering Hibernation" section of
  141. Documentation/driver-api/pm/devices.rst for the sequencing of the
  142. phases.
  143. Detailed Hibernation Sequence
  144. -----------------------------
  145. 1. The Linux power management (PM) subsystem prepares for
  146. hibernation by freezing user space processes and allocating
  147. memory to hold the hibernation image.
  148. 2. As part of the "freeze" phase, Linux PM calls the "suspend"
  149. function for each VMBus device in turn. As described above, this
  150. function removes sub-channels, and leaves the primary channel in
  151. a closed state.
  152. 3. Linux PM calls the "suspend" function for the VMBus bus, which
  153. closes any Hyper-V socket channels and unloads the top-level
  154. VMBus connection with the Hyper-V host.
  155. 4. Linux PM disables non-boot CPUs, creates the hibernation image in
  156. the previously allocated memory, then re-enables non-boot CPUs.
  157. The hibernation image contains the memory data structures for the
  158. closed primary channels, but no sub-channels.
  159. 5. As part of the "thaw" phase, Linux PM calls the "resume" function
  160. for the VMBus bus, which re-establishes the top-level VMBus
  161. connection and requests that Hyper-V re-offer the VMBus devices.
  162. As offers are received for the primary channels, the relids are
  163. updated as previously described.
  164. 6. Linux PM calls the "resume" function for each VMBus device. Each
  165. device re-opens its primary channel, and communicates with Hyper-V
  166. to re-establish sub-channels if appropriate. The sub-channels
  167. are re-created as new channels since they were previously removed
  168. entirely in Step 2.
  169. 7. With VMBus devices now working again, Linux PM writes the
  170. hibernation image from memory to disk.
  171. 8. Linux PM repeats Steps 2 and 3 above as part of the "poweroff"
  172. phase. VMBus channels are closed and the top-level VMBus
  173. connection is unloaded.
  174. 9. Linux PM disables non-boot CPUs, and then enters ACPI sleep state
  175. S4. Hibernation is now complete.
  176. Detailed Resume Sequence
  177. ------------------------
  178. 1. The guest VM boots into a fresh Linux OS instance. During boot,
  179. the top-level VMBus connection is established, and synthetic
  180. devices are enabled. This happens via the normal paths that don't
  181. involve hibernation.
  182. 2. Linux PM hibernation code reads swap space is to find and read
  183. the hibernation image into memory. If there is no hibernation
  184. image, then this boot becomes a normal boot.
  185. 3. If this is a resume from hibernation, the "freeze" phase is used
  186. to shutdown VMBus devices and unload the top-level VMBus
  187. connection in the running fresh OS instance, just like Steps 2
  188. and 3 in the hibernation sequence.
  189. 4. Linux PM disables non-boot CPUs, and transfers control to the
  190. read-in hibernation image. In the now-running hibernation image,
  191. non-boot CPUs are restarted.
  192. 5. As part of the "resume" phase, Linux PM repeats Steps 5 and 6
  193. from the hibernation sequence. The top-level VMBus connection is
  194. re-established, and offers are received and matched to primary
  195. channels in the image. Relids are updated. VMBus device resume
  196. functions re-open primary channels and re-create sub-channels.
  197. 6. Linux PM exits the hibernation resume sequence and the VM is now
  198. running normally from the hibernation image.
  199. Key-Value Pair (KVP) Pseudo-Device Anomalies
  200. --------------------------------------------
  201. The VMBus KVP device behaves differently from other pseudo-devices
  202. offered by Hyper-V. When the KVP primary channel is closed, Hyper-V
  203. sends a rescind message, which causes all vestiges of the device to be
  204. removed. But Hyper-V then re-offers the device, causing it to be newly
  205. re-created. The removal and re-creation occurs during the "freeze"
  206. phase of hibernation, so the hibernation image contains the re-created
  207. KVP device. Similar behavior occurs during the "freeze" phase of the
  208. resume sequence while still in the fresh OS instance. But in both
  209. cases, the top-level VMBus connection is subsequently unloaded, which
  210. causes the device to be discarded on the Hyper-V side. So no harm is
  211. done and everything still works.
  212. Virtual PCI devices
  213. -------------------
  214. Virtual PCI devices are physical PCI devices that are mapped directly
  215. into the VM's physical address space so the VM can interact directly
  216. with the hardware. vPCI devices include those accessed via what Hyper-V
  217. calls "Discrete Device Assignment" (DDA), as well as SR-IOV NIC
  218. Virtual Functions (VF) devices. See Documentation/virt/hyperv/vpci.rst.
  219. Hyper-V DDA devices are offered to guest VMs after the top-level VMBus
  220. connection is established, just like VMBus synthetic devices. They are
  221. statically assigned to the VM, and their instance GUIDs don't change
  222. unless the Hyper-V administrator makes changes to the configuration.
  223. DDA devices are represented in Linux as virtual PCI devices that have
  224. a VMBus identity as well as a PCI identity. Consequently, Linux guest
  225. hibernation first handles DDA devices as VMBus devices in order to
  226. manage the VMBus channel. But then they are also handled as PCI
  227. devices using the hibernation functions implemented by their native
  228. PCI driver.
  229. SR-IOV NIC VFs also have a VMBus identity as well as a PCI
  230. identity, and overall are processed similarly to DDA devices. A
  231. difference is that VFs are not offered to the VM during initial boot
  232. of the VM. Instead, the VMBus synthetic NIC driver first starts
  233. operating and communicates to Hyper-V that it is prepared to accept a
  234. VF, and then the VF offer is made. However, the VMBus connection
  235. might later be unloaded and then re-established without the VM being
  236. rebooted, as happens in Steps 3 and 5 in the Detailed Hibernation
  237. Sequence above and in the Detailed Resume Sequence. In such a case,
  238. the VFs likely became part of the VM during initial boot, so when the
  239. VMBus connection is re-established, the VFs are offered on the
  240. re-established connection without intervention by the synthetic NIC driver.
  241. UIO Devices
  242. -----------
  243. A VMBus device can be exposed to user space using the Hyper-V UIO
  244. driver (uio_hv_generic.c) so that a user space driver can control and
  245. operate the device. However, the VMBus UIO driver does not support the
  246. suspend and resume operations needed for hibernation. If a VMBus
  247. device is configured to use the UIO driver, hibernating the VM fails
  248. and Linux continues to run normally. The most common use of the Hyper-V
  249. UIO driver is for DPDK networking, but there are other uses as well.
  250. Resuming on a Different VM
  251. --------------------------
  252. This scenario occurs in the Azure public cloud in that a hibernated
  253. customer VM only exists as saved configuration and disks -- the VM no
  254. longer exists on any Hyper-V host. When the customer VM is resumed, a
  255. new Hyper-V VM with identical configuration is created, likely on a
  256. different Hyper-V host. That new Hyper-V VM becomes the resumed
  257. customer VM, and the steps the Linux kernel takes to resume from the
  258. hibernation image must work in that new VM.
  259. While the disks and their contents are preserved from the original VM,
  260. the Hyper-V-provided VMBus instance GUIDs of the disk controllers and
  261. other synthetic devices would typically be different. The difference
  262. would cause the resume from hibernation to fail, so several things are
  263. done to solve this problem:
  264. * For VMBus synthetic devices that support only a single instance,
  265. Hyper-V always assigns the same instance GUIDs. For example, the
  266. Hyper-V mouse, the shutdown pseudo-device, the time sync pseudo
  267. device, etc., always have the same instance GUID, both for local
  268. Hyper-V installs as well as in the Azure cloud.
  269. * VMBus synthetic SCSI controllers may have multiple instances in a
  270. VM, and in the general case instance GUIDs vary from VM to VM.
  271. However, Azure VMs always have exactly two synthetic SCSI
  272. controllers, and Azure code overrides the normal Hyper-V behavior
  273. so these controllers are always assigned the same two instance
  274. GUIDs. Consequently, when a customer VM is resumed on a newly
  275. created VM, the instance GUIDs match. But this guarantee does not
  276. hold for local Hyper-V installs.
  277. * Similarly, VMBus synthetic NICs may have multiple instances in a
  278. VM, and the instance GUIDs vary from VM to VM. Again, Azure code
  279. overrides the normal Hyper-V behavior so that the instance GUID
  280. of a synthetic NIC in a customer VM does not change, even if the
  281. customer VM is deallocated or hibernated, and then re-constituted
  282. on a newly created VM. As with SCSI controllers, this behavior
  283. does not hold for local Hyper-V installs.
  284. * vPCI devices do not have the same instance GUIDs when resuming
  285. from hibernation on a newly created VM. Consequently, Azure does
  286. not support hibernation for VMs that have DDA devices such as
  287. NVMe controllers or GPUs. For SR-IOV NIC VFs, Azure removes the
  288. VF from the VM before it hibernates so that the hibernation image
  289. does not contain a VF device. When the VM is resumed it
  290. instantiates a new VF, rather than trying to match against a VF
  291. that is present in the hibernation image. Because Azure must
  292. remove any VFs before initiating hibernation, Azure VM
  293. hibernation must be initiated externally from the Azure Portal or
  294. Azure CLI, which in turn uses the Shutdown integration service to
  295. tell Linux to do the hibernation. If hibernation is self-initiated
  296. within the Azure VM, VFs remain in the hibernation image, and are
  297. not resumed properly.
  298. In summary, Azure takes special actions to remove VFs and to ensure
  299. that VMBus device instance GUIDs match on a new/different VM, allowing
  300. hibernation to work for most general-purpose Azure VMs sizes. While
  301. similar special actions could be taken when resuming on a different VM
  302. on a local Hyper-V install, orchestrating such actions is not provided
  303. out-of-the-box by local Hyper-V and so requires custom scripting.