hugetlb.rst 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
  1. ==================
  2. HugeTLB Controller
  3. ==================
  4. HugeTLB controller can be created by first mounting the cgroup filesystem.
  5. # mount -t cgroup -o hugetlb none /sys/fs/cgroup
  6. With the above step, the initial or the parent HugeTLB group becomes
  7. visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
  8. the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
  9. New groups can be created under the parent group /sys/fs/cgroup::
  10. # cd /sys/fs/cgroup
  11. # mkdir g1
  12. # echo $$ > g1/tasks
  13. The above steps create a new group g1 and move the current shell
  14. process (bash) into it.
  15. Brief summary of control files::
  16. hugetlb.<hugepagesize>.rsvd.limit_in_bytes # set/show limit of "hugepagesize" hugetlb reservations
  17. hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes # show max "hugepagesize" hugetlb reservations and no-reserve faults
  18. hugetlb.<hugepagesize>.rsvd.usage_in_bytes # show current reservations and no-reserve faults for "hugepagesize" hugetlb
  19. hugetlb.<hugepagesize>.rsvd.failcnt # show the number of allocation failure due to HugeTLB reservation limit
  20. hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb faults
  21. hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
  22. hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
  23. hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
  24. hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
  25. For a system supporting three hugepage sizes (64k, 32M and 1G), the control
  26. files include::
  27. hugetlb.1GB.limit_in_bytes
  28. hugetlb.1GB.max_usage_in_bytes
  29. hugetlb.1GB.numa_stat
  30. hugetlb.1GB.usage_in_bytes
  31. hugetlb.1GB.failcnt
  32. hugetlb.1GB.rsvd.limit_in_bytes
  33. hugetlb.1GB.rsvd.max_usage_in_bytes
  34. hugetlb.1GB.rsvd.usage_in_bytes
  35. hugetlb.1GB.rsvd.failcnt
  36. hugetlb.64KB.limit_in_bytes
  37. hugetlb.64KB.max_usage_in_bytes
  38. hugetlb.64KB.numa_stat
  39. hugetlb.64KB.usage_in_bytes
  40. hugetlb.64KB.failcnt
  41. hugetlb.64KB.rsvd.limit_in_bytes
  42. hugetlb.64KB.rsvd.max_usage_in_bytes
  43. hugetlb.64KB.rsvd.usage_in_bytes
  44. hugetlb.64KB.rsvd.failcnt
  45. hugetlb.32MB.limit_in_bytes
  46. hugetlb.32MB.max_usage_in_bytes
  47. hugetlb.32MB.numa_stat
  48. hugetlb.32MB.usage_in_bytes
  49. hugetlb.32MB.failcnt
  50. hugetlb.32MB.rsvd.limit_in_bytes
  51. hugetlb.32MB.rsvd.max_usage_in_bytes
  52. hugetlb.32MB.rsvd.usage_in_bytes
  53. hugetlb.32MB.rsvd.failcnt
  54. 1. Page fault accounting
  55. ::
  56. hugetlb.<hugepagesize>.limit_in_bytes
  57. hugetlb.<hugepagesize>.max_usage_in_bytes
  58. hugetlb.<hugepagesize>.usage_in_bytes
  59. hugetlb.<hugepagesize>.failcnt
  60. The HugeTLB controller allows users to limit the HugeTLB usage (page fault) per
  61. control group and enforces the limit during page fault. Since HugeTLB
  62. doesn't support page reclaim, enforcing the limit at page fault time implies
  63. that, the application will get SIGBUS signal if it tries to fault in HugeTLB
  64. pages beyond its limit. Therefore the application needs to know exactly how many
  65. HugeTLB pages it uses beforehand, and the sysadmin needs to make sure that
  66. there are enough available on the machine for all the users to avoid processes
  67. getting SIGBUS.
  68. 2. Reservation accounting
  69. ::
  70. hugetlb.<hugepagesize>.rsvd.limit_in_bytes
  71. hugetlb.<hugepagesize>.rsvd.max_usage_in_bytes
  72. hugetlb.<hugepagesize>.rsvd.usage_in_bytes
  73. hugetlb.<hugepagesize>.rsvd.failcnt
  74. The HugeTLB controller allows limiting the HugeTLB reservations per control
  75. group and enforces the controller limit at reservation time and at the fault of
  76. HugeTLB memory for which no reservation exists. Since reservation limits are
  77. enforced at reservation time (on mmap or shget), reservation limits never cause
  78. the application to get SIGBUS signal if the memory was reserved beforehand. For
  79. MAP_NORESERVE allocations, the reservation limit behaves the same as the fault
  80. limit, enforcing memory usage at fault time and causing the application to
  81. receive a SIGBUS if it's crossing its limit.
  82. Reservation limits are superior to page fault limits described above, since
  83. reservation limits are enforced at reservation time (on mmap or shget), and
  84. never cause the application to get SIGBUS signal if the memory was reserved
  85. beforehand. This allows for easier fallback to alternatives such as
  86. non-HugeTLB memory for example. In the case of page fault accounting, it's very
  87. hard to avoid processes getting SIGBUS since the sysadmin needs to precisely know
  88. the HugeTLB usage of all the tasks in the system and make sure there are enough
  89. pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommitted
  90. systems is practically impossible with page fault accounting.
  91. 3. Caveats with shared memory
  92. For shared HugeTLB memory, both HugeTLB reservation and page faults are charged
  93. to the first task that causes the memory to be reserved or faulted, and all
  94. subsequent uses of this reserved or faulted memory is done without charging.
  95. Shared HugeTLB memory is only uncharged when it is unreserved or deallocated.
  96. This is usually when the HugeTLB file is deleted, and not when the task that
  97. caused the reservation or fault has exited.
  98. 4. Caveats with HugeTLB cgroup offline.
  99. When a HugeTLB cgroup goes offline with some reservations or faults still
  100. charged to it, the behavior is as follows:
  101. - The fault charges are charged to the parent HugeTLB cgroup (reparented),
  102. - the reservation charges remain on the offline HugeTLB cgroup.
  103. This means that if a HugeTLB cgroup gets offlined while there is still HugeTLB
  104. reservations charged to it, that cgroup persists as a zombie until all HugeTLB
  105. reservations are uncharged. HugeTLB reservations behave in this manner to match
  106. the memory controller whose cgroups also persist as zombie until all charged
  107. memory is uncharged. Also, the tracking of HugeTLB reservations is a bit more
  108. complex compared to the tracking of HugeTLB faults, so it is significantly
  109. harder to reparent reservations at offline time.