zsmalloc.rst 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
  1. ========
  2. zsmalloc
  3. ========
  4. This allocator is designed for use with zram. Thus, the allocator is
  5. supposed to work well under low memory conditions. In particular, it
  6. never attempts higher order page allocation which is very likely to
  7. fail under memory pressure. On the other hand, if we just use single
  8. (0-order) pages, it would suffer from very high fragmentation --
  9. any object of size PAGE_SIZE/2 or larger would occupy an entire page.
  10. This was one of the major issues with its predecessor (xvmalloc).
  11. To overcome these issues, zsmalloc allocates a bunch of 0-order pages
  12. and links them together using various 'struct page' fields. These linked
  13. pages act as a single higher-order page i.e. an object can span 0-order
  14. page boundaries. The code refers to these linked pages as a single entity
  15. called zspage.
  16. For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
  17. since this satisfies the requirements of all its current users (in the
  18. worst case, page is incompressible and is thus stored "as-is" i.e. in
  19. uncompressed form). For allocation requests larger than this size, failure
  20. is returned (see zs_malloc).
  21. Additionally, zs_malloc() does not return a dereferenceable pointer.
  22. Instead, it returns an opaque handle (unsigned long) which encodes actual
  23. location of the allocated object. The reason for this indirection is that
  24. zsmalloc does not keep zspages permanently mapped since that would cause
  25. issues on 32-bit systems where the VA region for kernel space mappings
  26. is very small. So, using the allocated memory should be done through the
  27. proper handle-based APIs.
  28. stat
  29. ====
  30. With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
  31. ``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
  32. # cat /sys/kernel/debug/zsmalloc/zram0/classes
  33. class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  34. ...
  35. ...
  36. 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14
  37. 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44
  38. 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26
  39. ...
  40. ...
  41. class
  42. index
  43. size
  44. object size zspage stores
  45. 10%
  46. the number of zspages with usage ratio less than 10% (see below)
  47. 20%
  48. the number of zspages with usage ratio between 10% and 20%
  49. 30%
  50. the number of zspages with usage ratio between 20% and 30%
  51. 40%
  52. the number of zspages with usage ratio between 30% and 40%
  53. 50%
  54. the number of zspages with usage ratio between 40% and 50%
  55. 60%
  56. the number of zspages with usage ratio between 50% and 60%
  57. 70%
  58. the number of zspages with usage ratio between 60% and 70%
  59. 80%
  60. the number of zspages with usage ratio between 70% and 80%
  61. 90%
  62. the number of zspages with usage ratio between 80% and 90%
  63. 99%
  64. the number of zspages with usage ratio between 90% and 99%
  65. 100%
  66. the number of zspages with usage ratio 100%
  67. obj_allocated
  68. the number of objects allocated
  69. obj_used
  70. the number of objects allocated to the user
  71. pages_used
  72. the number of pages allocated for the class
  73. pages_per_zspage
  74. the number of 0-order pages to make a zspage
  75. freeable
  76. the approximate number of pages class compaction can free
  77. Each zspage maintains inuse counter which keeps track of the number of
  78. objects stored in the zspage. The inuse counter determines the zspage's
  79. "fullness group" which is calculated as the ratio of the "inuse" objects to
  80. the total number of objects the zspage can hold (objs_per_zspage). The
  81. closer the inuse counter is to objs_per_zspage, the better.
  82. Internals
  83. =========
  84. zsmalloc has 255 size classes, each of which can hold a number of zspages.
  85. Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
  86. The optimal zspage chain size for each size class is calculated during the
  87. creation of the zsmalloc pool (see calculate_zspage_chain_size()).
  88. As an optimization, zsmalloc merges size classes that have similar
  89. characteristics in terms of the number of pages per zspage and the number
  90. of objects that each zspage can store.
  91. For instance, consider the following size classes:::
  92. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  93. ...
  94. 94 1536 0 .... 0 0 0 0 3 0
  95. 100 1632 0 .... 0 0 0 0 2 0
  96. ...
  97. Size classes #95-99 are merged with size class #100. This means that when we
  98. need to store an object of size, say, 1568 bytes, we end up using size class
  99. #100 instead of size class #96. Size class #100 is meant for objects of size
  100. 1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
  101. Size class #100 consists of zspages with 2 physical pages each, which can
  102. hold a total of 5 objects. If we need to store 13 objects of size 1568, we
  103. end up allocating three zspages, or 6 physical pages.
  104. However, if we take a closer look at size class #96 (which is meant for
  105. objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
  106. find that the most optimal zspage configuration for this class is a chain
  107. of 5 physical pages:::
  108. pages per zspage wasted bytes used%
  109. 1 960 76
  110. 2 352 95
  111. 3 1312 89
  112. 4 704 95
  113. 5 96 99
  114. This means that a class #96 configuration with 5 physical pages can store 13
  115. objects of size 1568 in a single zspage, using a total of 5 physical pages.
  116. This is more efficient than the class #100 configuration, which would use 6
  117. physical pages to store the same number of objects.
  118. As the zspage chain size for class #96 increases, its key characteristics
  119. such as pages per-zspage and objects per-zspage also change. This leads to
  120. dewer class mergers, resulting in a more compact grouping of classes, which
  121. reduces memory wastage.
  122. Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
  123. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  124. ...
  125. 202 3264 0 .. 0 0 0 0 4 0
  126. 254 4096 0 .. 0 0 0 0 1 0
  127. ...
  128. Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
  129. per zspage. Any object larger than 3264 bytes is considered huge and belongs
  130. to size class #254, which stores each object in its own physical page (objects
  131. in huge classes do not share pages).
  132. Increasing the size of the chain of zspages also results in a higher watermark
  133. for the huge size class and fewer huge classes overall. This allows for more
  134. efficient storage of large objects.
  135. For zspage chain size of 8, huge class watermark becomes 3632 bytes:::
  136. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  137. ...
  138. 202 3264 0 .. 0 0 0 0 4 0
  139. 211 3408 0 .. 0 0 0 0 5 0
  140. 217 3504 0 .. 0 0 0 0 6 0
  141. 222 3584 0 .. 0 0 0 0 7 0
  142. 225 3632 0 .. 0 0 0 0 8 0
  143. 254 4096 0 .. 0 0 0 0 1 0
  144. ...
  145. For zspage chain size of 16, huge class watermark becomes 3840 bytes:::
  146. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  147. ...
  148. 202 3264 0 .. 0 0 0 0 4 0
  149. 206 3328 0 .. 0 0 0 0 13 0
  150. 207 3344 0 .. 0 0 0 0 9 0
  151. 208 3360 0 .. 0 0 0 0 14 0
  152. 211 3408 0 .. 0 0 0 0 5 0
  153. 212 3424 0 .. 0 0 0 0 16 0
  154. 214 3456 0 .. 0 0 0 0 11 0
  155. 217 3504 0 .. 0 0 0 0 6 0
  156. 219 3536 0 .. 0 0 0 0 13 0
  157. 222 3584 0 .. 0 0 0 0 7 0
  158. 223 3600 0 .. 0 0 0 0 15 0
  159. 225 3632 0 .. 0 0 0 0 8 0
  160. 228 3680 0 .. 0 0 0 0 9 0
  161. 230 3712 0 .. 0 0 0 0 10 0
  162. 232 3744 0 .. 0 0 0 0 11 0
  163. 234 3776 0 .. 0 0 0 0 12 0
  164. 235 3792 0 .. 0 0 0 0 13 0
  165. 236 3808 0 .. 0 0 0 0 14 0
  166. 238 3840 0 .. 0 0 0 0 15 0
  167. 254 4096 0 .. 0 0 0 0 1 0
  168. ...
  169. Overall the combined zspage chain size effect on zsmalloc pool configuration:::
  170. pages per zspage number of size classes (clusters) huge size class watermark
  171. 4 69 3264
  172. 5 86 3408
  173. 6 93 3504
  174. 7 112 3584
  175. 8 123 3632
  176. 9 140 3680
  177. 10 143 3712
  178. 11 159 3744
  179. 12 164 3776
  180. 13 180 3792
  181. 14 183 3808
  182. 15 188 3840
  183. 16 191 3840
  184. A synthetic test
  185. ----------------
  186. zram as a build artifacts storage (Linux kernel compilation).
  187. * `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
  188. zsmalloc classes stats:::
  189. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  190. ...
  191. Total 13 .. 51 413836 412973 159955 3
  192. zram mm_stat:::
  193. 1691783168 628083717 655175680 0 655175680 60 0 34048 34049
  194. * `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
  195. zsmalloc classes stats:::
  196. class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable
  197. ...
  198. Total 18 .. 87 414852 412978 156666 0
  199. zram mm_stat:::
  200. 1691803648 627793930 641703936 0 641703936 60 0 33591 33591
  201. Using larger zspage chains may result in using fewer physical pages, as seen
  202. in the example where the number of physical pages used decreased from 159955
  203. to 156666, at the same time maximum zsmalloc pool memory usage went down from
  204. 655175680 to 641703936 bytes.
  205. However, this advantage may be offset by the potential for increased system
  206. memory pressure (as some zspages have larger chain sizes) in cases where there
  207. is heavy internal fragmentation and zspool compaction is unable to relocate
  208. objects and release zspages. In these cases, it is recommended to decrease
  209. the limit on the size of the zspage chains (as specified by the
  210. CONFIG_ZSMALLOC_CHAIN_SIZE option).
  211. Functions
  212. =========
  213. .. kernel-doc:: mm/zsmalloc.c