driver_development_debugging_guide.rst 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235
  1. .. SPDX-License-Identifier: GPL-2.0
  2. ========================================
  3. Debugging advice for driver development
  4. ========================================
  5. This document serves as a general starting point and lookup for debugging
  6. device drivers.
  7. While this guide focuses on debugging that requires re-compiling the
  8. module/kernel, the :doc:`userspace debugging guide
  9. </process/debugging/userspace_debugging_guide>` will guide
  10. you through tools like dynamic debug, ftrace and other tools useful for
  11. debugging issues and behavior.
  12. For general debugging advice, see the :doc:`general advice document
  13. </process/debugging/index>`.
  14. .. contents::
  15. :depth: 3
  16. The following sections show you the available tools.
  17. printk() & friends
  18. ------------------
  19. These are derivatives of printf() with varying destinations and support for
  20. being dynamically turned on or off, or lack thereof.
  21. Simple printk()
  22. ~~~~~~~~~~~~~~~
  23. The classic, can be used to great effect for quick and dirty development
  24. of new modules or to extract arbitrary necessary data for troubleshooting.
  25. Prerequisite: ``CONFIG_PRINTK`` (usually enabled by default)
  26. **Pros**:
  27. - No need to learn anything, simple to use
  28. - Easy to modify exactly to your needs (formatting of the data (See:
  29. :doc:`/core-api/printk-formats`), visibility in the log)
  30. - Can cause delays in the execution of the code (beneficial to confirm whether
  31. timing is a factor)
  32. **Cons**:
  33. - Requires rebuilding the kernel/module
  34. - Can cause delays in the execution of the code (which can cause issues to be
  35. not reproducible)
  36. For the full documentation see :doc:`/core-api/printk-basics`
  37. Trace_printk
  38. ~~~~~~~~~~~~
  39. Prerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>``
  40. It is a tiny bit less comfortable to use than printk(), because you will have
  41. to read the messages from the trace file (See: :ref:`read_ftrace_log`
  42. instead of from the kernel log, but very useful when printk() adds unwanted
  43. delays into the code execution, causing issues to be flaky or hidden.)
  44. If the processing of this still causes timing issues then you can try
  45. trace_puts().
  46. For the full Documentation see trace_printk()
  47. dev_dbg
  48. ~~~~~~~
  49. Print statement, which can be targeted by
  50. :ref:`process/debugging/userspace_debugging_guide:dynamic debug` that contains
  51. additional information about the device used within the context.
  52. **When is it appropriate to leave a debug print in the code?**
  53. Permanent debug statements have to be useful for a developer to troubleshoot
  54. driver misbehavior. Judging that is a bit more of an art than a science, but
  55. some guidelines are in the :ref:`Coding style guidelines
  56. <process/coding-style:13) printing kernel messages>`. In almost all cases the
  57. debug statements shouldn't be upstreamed, as a working driver is supposed to be
  58. silent.
  59. Custom printk
  60. ~~~~~~~~~~~~~
  61. Example::
  62. #define core_dbg(fmt, arg...) do { \
  63. if (core_debug) \
  64. printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
  65. } while (0)
  66. **When should you do this?**
  67. It is better to just use a pr_debug(), which can later be turned on/off with
  68. dynamic debug. Additionally, a lot of drivers activate these prints via a
  69. variable like ``core_debug`` set by a module parameter. However, Module
  70. parameters `are not recommended anymore
  71. <https://lore.kernel.org/all/2024032757-surcharge-grime-d3dd@gregkh>`_.
  72. Ftrace
  73. ------
  74. Creating a custom Ftrace tracepoint
  75. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  76. A tracepoint adds a hook into your code that will be called and logged when the
  77. tracepoint is enabled. This can be used, for example, to trace hitting a
  78. conditional branch or to dump the internal state at specific points of the code
  79. flow during a debugging session.
  80. Here is a basic description of :ref:`how to implement new tracepoints
  81. <trace/tracepoints:usage>`.
  82. For the full event tracing documentation see :doc:`/trace/events`
  83. For the full Ftrace documentation see :doc:`/trace/ftrace`
  84. DebugFS
  85. -------
  86. Prerequisite: ``CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>``
  87. DebugFS differs from the other approaches of debugging, as it doesn't write
  88. messages to the kernel log nor add traces to the code. Instead it allows the
  89. developer to handle a set of files.
  90. With these files you can either store values of variables or make
  91. register/memory dumps or you can make these files writable and modify
  92. values/settings in the driver.
  93. Possible use-cases among others:
  94. - Store register values
  95. - Keep track of variables
  96. - Store errors
  97. - Store settings
  98. - Toggle a setting like debug on/off
  99. - Error injection
  100. This is especially useful, when the size of a data dump would be hard to digest
  101. as part of the general kernel log (for example when dumping raw bitstream data)
  102. or when you are not interested in all the values all the time, but with the
  103. possibility to inspect them.
  104. The general idea is:
  105. - Create a directory during probe (``struct dentry *parent =
  106. debugfs_create_dir("my_driver", NULL);``)
  107. - Create a file (``debugfs_create_u32("my_value", 444, parent, &my_variable);``)
  108. - In this example the file is found in
  109. ``/sys/kernel/debug/my_driver/my_value`` (with read permissions for
  110. user/group/all)
  111. - any read of the file will return the current contents of the variable
  112. ``my_variable``
  113. - Clean up the directory when removing the device
  114. (``debugfs_remove(parent);``)
  115. For the full documentation see :doc:`/filesystems/debugfs`.
  116. KASAN, UBSAN, lockdep and other error checkers
  117. ----------------------------------------------
  118. KASAN (Kernel Address Sanitizer)
  119. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  120. Prerequisite: ``CONFIG_KASAN``
  121. KASAN is a dynamic memory error detector that helps to find use-after-free and
  122. out-of-bounds bugs. It uses compile-time instrumentation to check every memory
  123. access.
  124. For the full documentation see :doc:`/dev-tools/kasan`.
  125. UBSAN (Undefined Behavior Sanitizer)
  126. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  127. Prerequisite: ``CONFIG_UBSAN``
  128. UBSAN relies on compiler instrumentation and runtime checks to detect undefined
  129. behavior. It is designed to find a variety of issues, including signed integer
  130. overflow, array index out of bounds, and more.
  131. For the full documentation see :doc:`/dev-tools/ubsan`
  132. lockdep (Lock Dependency Validator)
  133. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  134. Prerequisite: ``CONFIG_DEBUG_LOCKDEP``
  135. lockdep is a runtime lock dependency validator that detects potential deadlocks
  136. and other locking-related issues in the kernel.
  137. It tracks lock acquisitions and releases, building a dependency graph that is
  138. analyzed for potential deadlocks.
  139. lockdep is especially useful for validating the correctness of lock ordering in
  140. the kernel.
  141. PSI (Pressure stall information tracking)
  142. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  143. Prerequisite: ``CONFIG_PSI``
  144. PSI is a measurement tool to identify excessive overcommits on hardware
  145. resources, that can cause performance disruptions or even OOM kills.
  146. device coredump
  147. ---------------
  148. Prerequisite: ``CONFIG_DEV_COREDUMP`` & ``#include <linux/devcoredump.h>``
  149. Provides the infrastructure for a driver to provide arbitrary data to userland.
  150. It is most often used in conjunction with udev or similar userland application
  151. to listen for kernel uevents, which indicate that the dump is ready. Udev has
  152. rules to copy that file somewhere for long-term storage and analysis, as by
  153. default, the data for the dump is automatically cleaned up after a default
  154. 5 minutes. That data is analyzed with driver-specific tools or GDB.
  155. A device coredump can be created with a vmalloc area, with read/free
  156. methods, or as a scatter/gather list.
  157. You can find an example implementation at:
  158. `drivers/media/platform/qcom/venus/core.c
  159. <https://elixir.bootlin.com/linux/v6.11.6/source/drivers/media/platform/qcom/venus/core.c#L30>`__,
  160. in the Bluetooth HCI layer, in several wireless drivers, and in several
  161. DRM drivers.
  162. devcoredump interfaces
  163. ~~~~~~~~~~~~~~~~~~~~~~
  164. .. kernel-doc:: include/linux/devcoredump.h
  165. .. kernel-doc:: drivers/base/devcoredump.c
  166. **Copyright** ©2024 : Collabora