perf.rst 8.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238
  1. .. SPDX-License-Identifier: GPL-2.0
  2. .. _perf_index:
  3. ====
  4. Perf
  5. ====
  6. Perf Event Attributes
  7. =====================
  8. :Author: Andrew Murray <andrew.murray@arm.com>
  9. :Date: 2019-03-06
  10. exclude_user
  11. ------------
  12. This attribute excludes userspace.
  13. Userspace always runs at EL0 and thus this attribute will exclude EL0.
  14. exclude_kernel
  15. --------------
  16. This attribute excludes the kernel.
  17. The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
  18. at EL1.
  19. For the host this attribute will exclude EL1 and additionally EL2 on a VHE
  20. system.
  21. For the guest this attribute will exclude EL1. Please note that EL2 is
  22. never counted within a guest.
  23. exclude_hv
  24. ----------
  25. This attribute excludes the hypervisor.
  26. For a VHE host this attribute is ignored as we consider the host kernel to
  27. be the hypervisor.
  28. For a non-VHE host this attribute will exclude EL2 as we consider the
  29. hypervisor to be any code that runs at EL2 which is predominantly used for
  30. guest/host transitions.
  31. For the guest this attribute has no effect. Please note that EL2 is
  32. never counted within a guest.
  33. exclude_host / exclude_guest
  34. ----------------------------
  35. These attributes exclude the KVM host and guest, respectively.
  36. The KVM host may run at EL0 (userspace), EL1 (non-VHE kernel) and EL2 (VHE
  37. kernel or non-VHE hypervisor).
  38. The KVM guest may run at EL0 (userspace) and EL1 (kernel).
  39. Due to the overlapping exception levels between host and guests we cannot
  40. exclusively rely on the PMU's hardware exception filtering - therefore we
  41. must enable/disable counting on the entry and exit to the guest. This is
  42. performed differently on VHE and non-VHE systems.
  43. For non-VHE systems we exclude EL2 for exclude_host - upon entering and
  44. exiting the guest we disable/enable the event as appropriate based on the
  45. exclude_host and exclude_guest attributes.
  46. For VHE systems we exclude EL1 for exclude_guest and exclude both EL0,EL2
  47. for exclude_host. Upon entering and exiting the guest we modify the event
  48. to include/exclude EL0 as appropriate based on the exclude_host and
  49. exclude_guest attributes.
  50. The statements above also apply when these attributes are used within a
  51. non-VHE guest however please note that EL2 is never counted within a guest.
  52. Accuracy
  53. --------
  54. On non-VHE hosts we enable/disable counters on the entry/exit of host/guest
  55. transition at EL2 - however there is a period of time between
  56. enabling/disabling the counters and entering/exiting the guest. We are
  57. able to eliminate counters counting host events on the boundaries of guest
  58. entry/exit when counting guest events by filtering out EL2 for
  59. exclude_host. However when using !exclude_hv there is a small blackout
  60. window at the guest entry/exit where host events are not captured.
  61. On VHE systems there are no blackout windows.
  62. Perf Userspace PMU Hardware Counter Access
  63. ==========================================
  64. Overview
  65. --------
  66. The perf userspace tool relies on the PMU to monitor events. It offers an
  67. abstraction layer over the hardware counters since the underlying
  68. implementation is cpu-dependent.
  69. Arm64 allows userspace tools to have access to the registers storing the
  70. hardware counters' values directly.
  71. This targets specifically self-monitoring tasks in order to reduce the overhead
  72. by directly accessing the registers without having to go through the kernel.
  73. How-to
  74. ------
  75. The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
  76. registers is enabled and that the userspace has access to the relevant
  77. information in order to use them.
  78. In order to have access to the hardware counters, the global sysctl
  79. kernel/perf_user_access must first be enabled:
  80. .. code-block:: sh
  81. echo 1 > /proc/sys/kernel/perf_user_access
  82. It is necessary to open the event using the perf tool interface with config1:1
  83. attr bit set: the sys_perf_event_open syscall returns a fd which can
  84. subsequently be used with the mmap syscall in order to retrieve a page of memory
  85. containing information about the event. The PMU driver uses this page to expose
  86. to the user the hardware counter's index and other necessary data. Using this
  87. index enables the user to access the PMU registers using the `mrs` instruction.
  88. Access to the PMU registers is only valid while the sequence lock is unchanged.
  89. In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
  90. changed.
  91. The userspace access is supported in libperf using the perf_evsel__mmap()
  92. and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
  93. an example.
  94. About heterogeneous systems
  95. ---------------------------
  96. On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
  97. only be enabled when the tasks are pinned to a homogeneous subset of cores and
  98. the corresponding PMU instance is opened by specifying the 'type' attribute.
  99. The use of generic event types is not supported in this case.
  100. Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
  101. can be run using the perf tool to check that the access to the registers works
  102. correctly from userspace:
  103. .. code-block:: sh
  104. perf test -v user
  105. About chained events and counter sizes
  106. --------------------------------------
  107. The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
  108. counter along with userspace access. The sys_perf_event_open syscall will fail
  109. if a 64-bit counter is requested and the hardware doesn't support 64-bit
  110. counters. Chained events are not supported in conjunction with userspace counter
  111. access. If a 32-bit counter is requested on hardware with 64-bit counters, then
  112. userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
  113. 'pmc_width' field in the user page will indicate the valid width of the counter
  114. and should be used to mask the upper bits as needed.
  115. .. Links
  116. .. _tools/perf/arch/arm64/tests/user-events.c:
  117. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
  118. .. _tools/lib/perf/tests/test-evsel.c:
  119. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
  120. Event Counting Threshold
  121. ==========================================
  122. Overview
  123. --------
  124. FEAT_PMUv3_TH (Armv8.8) permits a PMU counter to increment only on
  125. events whose count meets a specified threshold condition. For example if
  126. threshold_compare is set to 2 ('Greater than or equal'), and the
  127. threshold is set to 2, then the PMU counter will now only increment by
  128. when an event would have previously incremented the PMU counter by 2 or
  129. more on a single processor cycle.
  130. To increment by 1 after passing the threshold condition instead of the
  131. number of events on that cycle, add the 'threshold_count' option to the
  132. commandline.
  133. How-to
  134. ------
  135. These are the parameters for controlling the feature:
  136. .. list-table::
  137. :header-rows: 1
  138. * - Parameter
  139. - Description
  140. * - threshold
  141. - Value to threshold the event by. A value of 0 means that
  142. thresholding is disabled and the other parameters have no effect.
  143. * - threshold_compare
  144. - | Comparison function to use, with the following values supported:
  145. |
  146. | 0: Not-equal
  147. | 1: Equals
  148. | 2: Greater-than-or-equal
  149. | 3: Less-than
  150. * - threshold_count
  151. - If this is set, count by 1 after passing the threshold condition
  152. instead of the value of the event on this cycle.
  153. The threshold, threshold_compare and threshold_count values can be
  154. provided per event, for example:
  155. .. code-block:: sh
  156. perf stat -e stall_slot/threshold=2,threshold_compare=2/ \
  157. -e dtlb_walk/threshold=10,threshold_compare=3,threshold_count/
  158. In this example the stall_slot event will count by 2 or more on every
  159. cycle where 2 or more stalls happen. And dtlb_walk will count by 1 on
  160. every cycle where the number of dtlb walks were less than 10.
  161. The maximum supported threshold value can be read from the caps of each
  162. PMU, for example:
  163. .. code-block:: sh
  164. cat /sys/bus/event_source/devices/armv8_pmuv3/caps/threshold_max
  165. 0x000000ff
  166. If a value higher than this is given, then opening the event will result
  167. in an error. The highest possible maximum is 4095, as the config field
  168. for threshold is limited to 12 bits, and the Perf tool will refuse to
  169. parse higher values.
  170. If the PMU doesn't support FEAT_PMUv3_TH, then threshold_max will read
  171. 0, and attempting to set a threshold value will also result in an error.
  172. threshold_max will also read as 0 on aarch32 guests, even if the host
  173. is running on hardware with the feature.