| 1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735 |
- ===================================
- Documentation for /proc/sys/kernel/
- ===================================
- .. See scripts/check-sysctl-docs to keep this up to date
- Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
- Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
- For general info and legal blurb, please look in
- Documentation/admin-guide/sysctl/index.rst.
- ------------------------------------------------------------------------------
- This file contains documentation for the sysctl files in
- ``/proc/sys/kernel/``.
- The files in this directory can be used to tune and monitor
- miscellaneous and general things in the operation of the Linux
- kernel. Since some of the files *can* be used to screw up your
- system, it is advisable to read both documentation and source
- before actually making adjustments.
- Currently, these files might (depending on your configuration)
- show up in ``/proc/sys/kernel``:
- .. contents:: :local:
- acct
- ====
- ::
- highwater lowwater frequency
- If BSD-style process accounting is enabled these values control
- its behaviour. If free space on filesystem where the log lives
- goes below ``lowwater``\ % accounting suspends. If free space gets
- above ``highwater``\ % accounting resumes. ``frequency`` determines
- how often do we check the amount of free space (value is in
- seconds). Default:
- ::
- 4 2 30
- That is, suspend accounting if free space drops below 2%; resume it
- if it increases to at least 4%; consider information about amount of
- free space valid for 30 seconds.
- acpi_video_flags
- ================
- See Documentation/power/video.rst. This allows the video resume mode to be set,
- in a similar fashion to the ``acpi_sleep`` kernel parameter, by
- combining the following values:
- = =======
- 1 s3_bios
- 2 s3_mode
- 4 s3_beep
- = =======
- arch
- ====
- The machine hardware name, the same output as ``uname -m``
- (e.g. ``x86_64`` or ``aarch64``).
- auto_msgmni
- ===========
- This variable has no effect and may be removed in future kernel
- releases. Reading it always returns 0.
- Up to Linux 3.17, it enabled/disabled automatic recomputing of
- `msgmni`_
- upon memory add/remove or upon IPC namespace creation/removal.
- Echoing "1" into this file enabled msgmni automatic recomputing.
- Echoing "0" turned it off. The default value was 1.
- bootloader_type (x86 only)
- ==========================
- This gives the bootloader type number as indicated by the bootloader,
- shifted left by 4, and OR'd with the low four bits of the bootloader
- version. The reason for this encoding is that this used to match the
- ``type_of_loader`` field in the kernel header; the encoding is kept for
- backwards compatibility. That is, if the full bootloader type number
- is 0x15 and the full version number is 0x234, this file will contain
- the value 340 = 0x154.
- See the ``type_of_loader`` and ``ext_loader_type`` fields in
- Documentation/arch/x86/boot.rst for additional information.
- bootloader_version (x86 only)
- =============================
- The complete bootloader version number. In the example above, this
- file will contain the value 564 = 0x234.
- See the ``type_of_loader`` and ``ext_loader_ver`` fields in
- Documentation/arch/x86/boot.rst for additional information.
- bpf_stats_enabled
- =================
- Controls whether the kernel should collect statistics on BPF programs
- (total time spent running, number of times run...). Enabling
- statistics causes a slight reduction in performance on each program
- run. The statistics can be seen using ``bpftool``.
- = ===================================
- 0 Don't collect statistics (default).
- 1 Collect statistics.
- = ===================================
- cad_pid
- =======
- This is the pid which will be signalled on reboot (notably, by
- Ctrl-Alt-Delete). Writing a value to this file which doesn't
- correspond to a running process will result in ``-ESRCH``.
- See also `ctrl-alt-del`_.
- cap_last_cap
- ============
- Highest valid capability of the running kernel. Exports
- ``CAP_LAST_CAP`` from the kernel.
- .. _core_pattern:
- core_pattern
- ============
- ``core_pattern`` is used to specify a core dumpfile pattern name.
- * max length 127 characters; default value is "core"
- * ``core_pattern`` is used as a pattern template for the output
- filename; certain string patterns (beginning with '%') are
- substituted with their actual values.
- * backward compatibility with ``core_uses_pid``:
- If ``core_pattern`` does not include "%p" (default does not)
- and ``core_uses_pid`` is set, then .PID will be appended to
- the filename.
- * corename format specifiers
- ======== ==========================================
- %<NUL> '%' is dropped
- %% output one '%'
- %p pid
- %P global pid (init PID namespace)
- %i tid
- %I global tid (init PID namespace)
- %u uid (in initial user namespace)
- %g gid (in initial user namespace)
- %d dump mode, matches ``PR_SET_DUMPABLE`` and
- ``/proc/sys/fs/suid_dumpable``
- %s signal number
- %t UNIX time of dump
- %h hostname
- %e executable filename (may be shortened, could be changed by prctl etc)
- %f executable filename
- %E executable path
- %c maximum size of core file by resource limit RLIMIT_CORE
- %C CPU the task ran on
- %F pidfd number
- %<OTHER> both are dropped
- ======== ==========================================
- * If the first character of the pattern is a '|', the kernel will treat
- the rest of the pattern as a command to run. The core dump will be
- written to the standard input of that program instead of to a file.
- core_pipe_limit
- ===============
- This sysctl is only applicable when `core_pattern`_ is configured to
- pipe core files to a user space helper (when the first character of
- ``core_pattern`` is a '|', see above).
- When collecting cores via a pipe to an application, it is occasionally
- useful for the collecting application to gather data about the
- crashing process from its ``/proc/pid`` directory.
- In order to do this safely, the kernel must wait for the collecting
- process to exit, so as not to remove the crashing processes proc files
- prematurely.
- This in turn creates the possibility that a misbehaving userspace
- collecting process can block the reaping of a crashed process simply
- by never exiting.
- This sysctl defends against that.
- It defines how many concurrent crashing processes may be piped to user
- space applications in parallel.
- If this value is exceeded, then those crashing processes above that
- value are noted via the kernel log and their cores are skipped.
- 0 is a special value, indicating that unlimited processes may be
- captured in parallel, but that no waiting will take place (i.e. the
- collecting process is not guaranteed access to ``/proc/<crashing
- pid>/``).
- This value defaults to 0.
- core_sort_vma
- =============
- The default coredump writes VMAs in address order. By setting
- ``core_sort_vma`` to 1, VMAs will be written from smallest size
- to largest size. This is known to break at least elfutils, but
- can be handy when dealing with very large (and truncated)
- coredumps where the more useful debugging details are included
- in the smaller VMAs.
- core_uses_pid
- =============
- The default coredump filename is "core". By setting
- ``core_uses_pid`` to 1, the coredump filename becomes core.PID.
- If `core_pattern`_ does not include "%p" (default does not)
- and ``core_uses_pid`` is set, then .PID will be appended to
- the filename.
- ctrl-alt-del
- ============
- When the value in this file is 0, ctrl-alt-del is trapped and
- sent to the ``init(1)`` program to handle a graceful restart.
- When, however, the value is > 0, Linux's reaction to a Vulcan
- Nerve Pinch (tm) will be an immediate reboot, without even
- syncing its dirty buffers.
- Note:
- when a program (like dosemu) has the keyboard in 'raw'
- mode, the ctrl-alt-del is intercepted by the program before it
- ever reaches the kernel tty layer, and it's up to the program
- to decide what to do with it.
- dmesg_restrict
- ==============
- This toggle indicates whether unprivileged users are prevented
- from using ``dmesg(8)`` to view messages from the kernel's log
- buffer.
- When ``dmesg_restrict`` is set to 0 there are no restrictions.
- When ``dmesg_restrict`` is set to 1, users must have
- ``CAP_SYSLOG`` to use ``dmesg(8)``.
- The kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
- default value of ``dmesg_restrict``.
- domainname & hostname
- =====================
- These files can be used to set the NIS/YP domainname and the
- hostname of your box in exactly the same way as the commands
- domainname and hostname, i.e.::
- # echo "darkstar" > /proc/sys/kernel/hostname
- # echo "mydomain" > /proc/sys/kernel/domainname
- has the same effect as::
- # hostname "darkstar"
- # domainname "mydomain"
- Note, however, that the classic darkstar.frop.org has the
- hostname "darkstar" and DNS (Internet Domain Name Server)
- domainname "frop.org", not to be confused with the NIS (Network
- Information Service) or YP (Yellow Pages) domainname. These two
- domain names are in general different. For a detailed discussion
- see the ``hostname(1)`` man page.
- firmware_config
- ===============
- See Documentation/driver-api/firmware/fallback-mechanisms.rst.
- The entries in this directory allow the firmware loader helper
- fallback to be controlled:
- * ``force_sysfs_fallback``, when set to 1, forces the use of the
- fallback;
- * ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
- ftrace_dump_on_oops
- ===================
- Determines whether ``ftrace_dump()`` should be called on an oops (or
- kernel panic). This will output the contents of the ftrace buffers to
- the console. This is very useful for capturing traces that lead to
- crashes and outputting them to a serial console.
- ======================= ===========================================
- 0 Disabled (default).
- 1 Dump buffers of all CPUs.
- 2(orig_cpu) Dump the buffer of the CPU that triggered the
- oops.
- <instance> Dump the specific instance buffer on all CPUs.
- <instance>=2(orig_cpu) Dump the specific instance buffer on the CPU
- that triggered the oops.
- ======================= ===========================================
- Multiple instance dump is also supported, and instances are separated
- by commas. If global buffer also needs to be dumped, please specify
- the dump mode (1/2/orig_cpu) first for global buffer.
- So for example to dump "foo" and "bar" instance buffer on all CPUs,
- user can::
- echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops
- To dump global buffer and "foo" instance buffer on all
- CPUs along with the "bar" instance buffer on CPU that triggered the
- oops, user can::
- echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops
- ftrace_enabled, stack_tracer_enabled
- ====================================
- See Documentation/trace/ftrace.rst.
- hardlockup_all_cpu_backtrace
- ============================
- This value controls the hard lockup detector behavior when a hard
- lockup condition is detected as to whether or not to gather further
- debug information. If enabled, arch-specific all-CPU stack dumping
- will be initiated.
- = ============================================
- 0 Do nothing. This is the default behavior.
- 1 On detection capture more debug information.
- = ============================================
- hardlockup_panic
- ================
- This parameter can be used to control whether the kernel panics
- when a hard lockup is detected.
- = ===========================
- 0 Don't panic on hard lockup.
- 1 Panic on hard lockup.
- = ===========================
- See Documentation/admin-guide/lockup-watchdogs.rst for more information.
- This can also be set using the nmi_watchdog kernel parameter.
- hotplug
- =======
- Path for the hotplug policy agent.
- Default value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults
- to the empty string.
- This file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most
- modern systems rely exclusively on the netlink-based uevent source and
- don't need this.
- hung_task_all_cpu_backtrace
- ===========================
- If this option is set, the kernel will send an NMI to all CPUs to dump
- their backtraces when a hung task is detected. This file shows up if
- CONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
- 0: Won't show all CPUs backtraces when a hung task is detected.
- This is the default behavior.
- 1: Will non-maskably interrupt all CPUs and dump their backtraces when
- a hung task is detected.
- hung_task_panic
- ===============
- When set to a non-zero value, a kernel panic will be triggered if the
- number of hung tasks found during a single scan reaches this value.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- = =======================================================
- 0 Continue operation. This is the default behavior.
- N Panic when N hung tasks are found during a single scan.
- = =======================================================
- hung_task_check_count
- =====================
- The upper bound on the number of tasks that are checked.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- hung_task_detect_count
- ======================
- Indicates the total number of tasks that have been detected as hung since
- the system boot.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- hung_task_sys_info
- ==================
- A comma separated list of extra system information to be dumped when
- hung task is detected, for example, "tasks,mem,timers,locks,...".
- Refer 'panic_sys_info' section below for more details.
- hung_task_timeout_secs
- ======================
- When a task in D state did not get scheduled
- for more than this value report a warning.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- 0 means infinite timeout, no checking is done.
- Possible values to set are in range {0:``LONG_MAX``/``HZ``}.
- hung_task_check_interval_secs
- =============================
- Hung task check interval. If hung task checking is enabled
- (see `hung_task_timeout_secs`_), the check is done every
- ``hung_task_check_interval_secs`` seconds.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- 0 (default) means use ``hung_task_timeout_secs`` as checking
- interval.
- Possible values to set are in range {0:``LONG_MAX``/``HZ``}.
- hung_task_warnings
- ==================
- The maximum number of warnings to report. During a check interval
- if a hung task is detected, this value is decreased by 1.
- When this value reaches 0, no more warnings will be reported.
- This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
- -1: report an infinite number of warnings.
- hyperv_record_panic_msg
- =======================
- Controls whether the panic kmsg data should be reported to Hyper-V.
- = =========================================================
- 0 Do not report panic kmsg data.
- 1 Report the panic kmsg data. This is the default behavior.
- = =========================================================
- ignore-unaligned-usertrap
- =========================
- On architectures where unaligned accesses cause traps, and where this
- feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
- currently, ``arc``, ``parisc`` and ``loongarch``), controls whether all
- unaligned traps are logged.
- = =============================================================
- 0 Log all unaligned accesses.
- 1 Only warn the first time a process traps. This is the default
- setting.
- = =============================================================
- See also `unaligned-trap`_.
- io_uring_disabled
- =================
- Prevents all processes from creating new io_uring instances. Enabling this
- shrinks the kernel's attack surface.
- = ======================================================================
- 0 All processes can create io_uring instances as normal. This is the
- default setting.
- 1 io_uring creation is disabled (io_uring_setup() will fail with
- -EPERM) for unprivileged processes not in the io_uring_group group.
- Existing io_uring instances can still be used. See the
- documentation for io_uring_group for more information.
- 2 io_uring creation is disabled for all processes. io_uring_setup()
- always fails with -EPERM. Existing io_uring instances can still be
- used.
- = ======================================================================
- io_uring_group
- ==============
- When io_uring_disabled is set to 1, a process must either be
- privileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order
- to create an io_uring instance. If io_uring_group is set to -1 (the
- default), only processes with the CAP_SYS_ADMIN capability may create
- io_uring instances.
- kernel_sys_info
- ===============
- A comma separated list of extra system information to be dumped when
- soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...".
- Refer 'panic_sys_info' section below for more details.
- It serves as the default kernel control knob, which will take effect
- when a kernel module calls sys_info() with parameter==0.
- kexec_load_disabled
- ===================
- A toggle indicating if the syscalls ``kexec_load`` and
- ``kexec_file_load`` have been disabled.
- This value defaults to 0 (false: ``kexec_*load`` enabled), but can be
- set to 1 (true: ``kexec_*load`` disabled).
- Once true, kexec can no longer be used, and the toggle cannot be set
- back to false.
- This allows a kexec image to be loaded before disabling the syscall,
- allowing a system to set up (and later use) an image without it being
- altered.
- Generally used together with the `modules_disabled`_ sysctl.
- kexec_load_limit_panic
- ======================
- This parameter specifies a limit to the number of times the syscalls
- ``kexec_load`` and ``kexec_file_load`` can be called with a crash
- image. It can only be set with a more restrictive value than the
- current one.
- == ======================================================
- -1 Unlimited calls to kexec. This is the default setting.
- N Number of calls left.
- == ======================================================
- kexec_load_limit_reboot
- =======================
- Similar functionality as ``kexec_load_limit_panic``, but for a normal
- image.
- kptr_restrict
- =============
- This toggle indicates whether restrictions are placed on
- exposing kernel addresses via ``/proc`` and other interfaces.
- When ``kptr_restrict`` is set to 0 (the default) the address is hashed
- before printing.
- (This is the equivalent to %p.)
- When ``kptr_restrict`` is set to 1, kernel pointers printed using the
- %pK format specifier will be replaced with 0s unless the user has
- ``CAP_SYSLOG`` and effective user and group ids are equal to the real
- ids.
- This is because %pK checks are done at read() time rather than open()
- time, so if permissions are elevated between the open() and the read()
- (e.g via a setuid binary) then %pK will not leak kernel pointers to
- unprivileged users.
- Note, this is a temporary solution only.
- The correct long-term solution is to do the permission checks at
- open() time.
- Consider removing world read permissions from files that use %pK, and
- using `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)``
- if leaking kernel pointer values to unprivileged users is a concern.
- When ``kptr_restrict`` is set to 2, kernel pointers printed using
- %pK will be replaced with 0s regardless of privileges.
- For disabling these security restrictions early at boot time (and once
- for all), use the ``hash_pointers`` boot parameter instead.
- softlockup_sys_info & hardlockup_sys_info
- =========================================
- A comma separated list of extra system information to be dumped when
- soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...".
- Refer 'panic_sys_info' section below for more details.
- modprobe
- ========
- The full path to the usermode helper for autoloading kernel modules,
- by default ``CONFIG_MODPROBE_PATH``, which in turn defaults to
- "/sbin/modprobe". This binary is executed when the kernel requests a
- module. For example, if userspace passes an unknown filesystem type
- to mount(), then the kernel will automatically request the
- corresponding filesystem module by executing this usermode helper.
- This usermode helper should insert the needed module into the kernel.
- This sysctl only affects module autoloading. It has no effect on the
- ability to explicitly insert modules.
- This sysctl can be used to debug module loading requests::
- echo '#! /bin/sh' > /tmp/modprobe
- echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
- echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
- chmod a+x /tmp/modprobe
- echo /tmp/modprobe > /proc/sys/kernel/modprobe
- Alternatively, if this sysctl is set to the empty string, then module
- autoloading is completely disabled. The kernel will not try to
- execute a usermode helper at all, nor will it call the
- kernel_module_request LSM hook.
- If CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration,
- then the configured static usermode helper overrides this sysctl,
- except that the empty string is still accepted to completely disable
- module autoloading as described above.
- modules_disabled
- ================
- A toggle value indicating if modules are allowed to be loaded
- in an otherwise modular kernel. This toggle defaults to off
- (0), but can be set true (1). Once true, modules can be
- neither loaded nor unloaded, and the toggle cannot be set back
- to false. Generally used with the `kexec_load_disabled`_ toggle.
- .. _msgmni:
- msgmax, msgmnb, and msgmni
- ==========================
- ``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by
- default (``MSGMAX``).
- ``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by
- default (``MSGMNB``).
- ``msgmni`` is the maximum number of IPC queues. 32000 by default
- (``MSGMNI``).
- All of these parameters are set per ipc namespace. The maximum number of bytes
- in POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is
- respected hierarchically in the each user namespace.
- msg_next_id, sem_next_id, and shm_next_id (System V IPC)
- ========================================================
- These three toggles allows to specify desired id for next allocated IPC
- object: message, semaphore or shared memory respectively.
- By default they are equal to -1, which means generic allocation logic.
- Possible values to set are in range {0:``INT_MAX``}.
- Notes:
- 1) kernel doesn't guarantee, that new object will have desired id. So,
- it's up to userspace, how to handle an object with "wrong" id.
- 2) Toggle with non-default value will be set back to -1 by kernel after
- successful IPC object allocation. If an IPC object allocation syscall
- fails, it is undefined if the value remains unmodified or is reset to -1.
- ngroups_max
- ===========
- Maximum number of supplementary groups, _i.e._ the maximum size which
- ``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
- nmi_watchdog
- ============
- This parameter can be used to control the NMI watchdog
- (i.e. the hard lockup detector) on x86 systems.
- = =================================
- 0 Disable the hard lockup detector.
- 1 Enable the hard lockup detector.
- = =================================
- The hard lockup detector monitors each CPU for its ability to respond to
- timer interrupts. The mechanism utilizes CPU performance counter registers
- that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
- while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
- The NMI watchdog is disabled by default if the kernel is running as a guest
- in a KVM virtual machine. This default can be overridden by adding::
- nmi_watchdog=1
- to the guest kernel command line (see
- Documentation/admin-guide/kernel-parameters.rst).
- nmi_wd_lpm_factor (PPC only)
- ============================
- Factor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is
- set to 1). This factor represents the percentage added to
- ``watchdog_thresh`` when calculating the NMI watchdog timeout during an
- LPM. The soft lockup timeout is not impacted.
- A value of 0 means no change. The default value is 200 meaning the NMI
- watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
- numa_balancing
- ==============
- Enables/disables and configures automatic page fault based NUMA memory
- balancing. Memory is moved automatically to nodes that access it often.
- The value to set can be the result of ORing the following:
- = =================================
- 0 NUMA_BALANCING_DISABLED
- 1 NUMA_BALANCING_NORMAL
- 2 NUMA_BALANCING_MEMORY_TIERING
- = =================================
- Or NUMA_BALANCING_NORMAL to optimize page placement among different
- NUMA nodes to reduce remote accessing. On NUMA machines, there is a
- performance penalty if remote memory is accessed by a CPU. When this
- feature is enabled the kernel samples what task thread is accessing
- memory by periodically unmapping pages and later trapping a page
- fault. At the time of the page fault, it is determined if the data
- being accessed should be migrated to a local memory node.
- The unmapping of pages and trapping faults incur additional overhead that
- ideally is offset by improved memory locality but there is no universal
- guarantee. If the target workload is already bound to NUMA nodes then this
- feature should be disabled.
- Or NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
- different types of memory (represented as different NUMA nodes) to
- place the hot pages in the fast memory. This is implemented based on
- unmapping and page fault too.
- numa_balancing_promote_rate_limit_MBps
- ======================================
- Too high promotion/demotion throughput between different memory types
- may hurt application latency. This can be used to rate limit the
- promotion throughput. The per-node max promotion throughput in MB/s
- will be limited to be no more than the set value.
- A rule of thumb is to set this to less than 1/10 of the PMEM node
- write bandwidth.
- oops_all_cpu_backtrace
- ======================
- If this option is set, the kernel will send an NMI to all CPUs to dump
- their backtraces when an oops event occurs. It should be used as a last
- resort in case a panic cannot be triggered (to protect VMs running, for
- example) or kdump can't be collected. This file shows up if CONFIG_SMP
- is enabled.
- 0: Won't show all CPUs backtraces when an oops is detected.
- This is the default behavior.
- 1: Will non-maskably interrupt all CPUs and dump their backtraces when
- an oops event is detected.
- oops_limit
- ==========
- Number of kernel oopses after which the kernel should panic when
- ``panic_on_oops`` is not set. Setting this to 0 disables checking
- the count. Setting this to 1 has the same effect as setting
- ``panic_on_oops=1``. The default value is 10000.
- osrelease, ostype & version
- ===========================
- ::
- # cat osrelease
- 2.1.88
- # cat ostype
- Linux
- # cat version
- #5 Wed Feb 25 21:49:24 MET 1998
- The files ``osrelease`` and ``ostype`` should be clear enough.
- ``version``
- needs a little more clarification however. The '#5' means that
- this is the fifth kernel built from this source base and the
- date behind it indicates the time the kernel was built.
- The only way to tune these values is to rebuild the kernel :-)
- overflowgid & overflowuid
- =========================
- if your architecture did not always support 32-bit UIDs (i.e. arm,
- i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
- applications that use the old 16-bit UID/GID system calls, if the
- actual UID or GID would exceed 65535.
- These sysctls allow you to change the value of the fixed UID and GID.
- The default is 65534.
- panic
- =====
- The value in this file determines the behaviour of the kernel on a
- panic:
- * if zero, the kernel will loop forever;
- * if negative, the kernel will reboot immediately;
- * if positive, the kernel will reboot after the corresponding number
- of seconds.
- When you use the software watchdog, the recommended setting is 60.
- panic_on_io_nmi
- ===============
- Controls the kernel's behavior when a CPU receives an NMI caused by
- an IO error.
- = ==================================================================
- 0 Try to continue operation (default).
- 1 Panic immediately. The IO error triggered an NMI. This indicates a
- serious system condition which could result in IO data corruption.
- Rather than continuing, panicking might be a better choice. Some
- servers issue this sort of NMI when the dump button is pushed,
- and you can use this option to take a crash dump.
- = ==================================================================
- panic_on_oops
- =============
- Controls the kernel's behaviour when an oops or BUG is encountered.
- = ===================================================================
- 0 Try to continue operation.
- 1 Panic immediately. If the `panic` sysctl is also non-zero then the
- machine will be rebooted.
- = ===================================================================
- panic_on_stackoverflow
- ======================
- Controls the kernel's behavior when detecting the overflows of
- kernel, IRQ and exception stacks except a user stack.
- This file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled.
- = ==========================
- 0 Try to continue operation.
- 1 Panic immediately.
- = ==========================
- panic_on_unrecovered_nmi
- ========================
- The default Linux behaviour on an NMI of either memory or unknown is
- to continue operation. For many environments such as scientific
- computing it is preferable that the box is taken out and the error
- dealt with than an uncorrected parity/ECC error get propagated.
- A small number of systems do generate NMIs for bizarre random reasons
- such as power management so the default is off. That sysctl works like
- the existing panic controls already in that directory.
- panic_on_warn
- =============
- Calls panic() in the WARN() path when set to 1. This is useful to avoid
- a kernel rebuild when attempting to kdump at the location of a WARN().
- = ================================================
- 0 Only WARN(), default behaviour.
- 1 Call panic() after printing out WARN() location.
- = ================================================
- panic_print
- ===========
- Bitmask for printing system info when panic happens. User can chose
- combination of the following bits:
- ===== ============================================
- bit 0 print all tasks info
- bit 1 print system memory info
- bit 2 print timer info
- bit 3 print locks info if ``CONFIG_LOCKDEP`` is on
- bit 4 print ftrace buffer
- bit 5 replay all kernel messages on consoles at the end of panic
- bit 6 print all CPUs backtrace (if available in the arch)
- bit 7 print only tasks in uninterruptible (blocked) state
- ===== ============================================
- So for example to print tasks and memory info on panic, user can::
- echo 3 > /proc/sys/kernel/panic_print
- panic_sys_info
- ==============
- A comma separated list of extra information to be dumped on panic,
- for example, "tasks,mem,timers,...". It is a human readable alternative
- to 'panic_print'. Possible values are:
- ============= ===================================================
- tasks print all tasks info
- mem print system memory info
- timers print timers info
- locks print locks info if CONFIG_LOCKDEP is on
- ftrace print ftrace buffer
- all_bt print all CPUs backtrace (if available in the arch)
- blocked_tasks print only tasks in uninterruptible (blocked) state
- ============= ===================================================
- panic_on_rcu_stall
- ==================
- When set to 1, calls panic() after RCU stall detection messages. This
- is useful to define the root cause of RCU stalls using a vmcore.
- = ============================================================
- 0 Do not panic() when RCU stall takes place, default behavior.
- 1 panic() after printing RCU stall messages.
- = ============================================================
- max_rcu_stall_to_panic
- ======================
- When ``panic_on_rcu_stall`` is set to 1, this value determines the
- number of times that RCU can stall before panic() is called.
- When ``panic_on_rcu_stall`` is set to 0, this value is has no effect.
- perf_cpu_time_max_percent
- =========================
- Hints to the kernel how much CPU time it should be allowed to
- use to handle perf sampling events. If the perf subsystem
- is informed that its samples are exceeding this limit, it
- will drop its sampling frequency to attempt to reduce its CPU
- usage.
- Some perf sampling happens in NMIs. If these samples
- unexpectedly take too long to execute, the NMIs can become
- stacked up next to each other so much that nothing else is
- allowed to execute.
- ===== ========================================================
- 0 Disable the mechanism. Do not monitor or correct perf's
- sampling rate no matter how CPU time it takes.
- 1-100 Attempt to throttle perf's sample rate to this
- percentage of CPU. Note: the kernel calculates an
- "expected" length of each sample event. 100 here means
- 100% of that expected length. Even if this is set to
- 100, you may still see sample throttling if this
- length is exceeded. Set to 0 if you truly do not care
- how much CPU is consumed.
- ===== ========================================================
- perf_event_paranoid
- ===================
- Controls use of the performance events system by unprivileged
- users (without CAP_PERFMON). The default value is 2.
- For backward compatibility reasons access to system performance
- monitoring and observability remains open for CAP_SYS_ADMIN
- privileged processes but CAP_SYS_ADMIN usage for secure system
- performance monitoring and observability operations is discouraged
- with respect to CAP_PERFMON use cases.
- === ==================================================================
- -1 Allow use of (almost) all events by all users.
- Ignore mlock limit after perf_event_mlock_kb without
- ``CAP_IPC_LOCK``.
- >=0 Disallow ftrace function tracepoint by users without
- ``CAP_PERFMON``.
- Disallow raw tracepoint access by users without ``CAP_PERFMON``.
- >=1 Disallow CPU event access by users without ``CAP_PERFMON``.
- >=2 Disallow kernel profiling by users without ``CAP_PERFMON``.
- === ==================================================================
- perf_event_max_stack
- ====================
- Controls maximum number of stack frames to copy for (``attr.sample_type &
- PERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using
- '``perf record -g``' or '``perf trace --call-graph fp``'.
- This can only be done when no events are in use that have callchains
- enabled, otherwise writing to this file will return ``-EBUSY``.
- The default value is 127.
- perf_event_mlock_kb
- ===================
- Control size of per-cpu ring buffer not counted against mlock limit.
- The default value is 512 + 1 page
- perf_event_max_contexts_per_stack
- =================================
- Controls maximum number of stack frame context entries for
- (``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for
- instance, when using '``perf record -g``' or '``perf trace --call-graph fp``'.
- This can only be done when no events are in use that have callchains
- enabled, otherwise writing to this file will return ``-EBUSY``.
- The default value is 8.
- perf_user_access (arm64 and riscv only)
- =======================================
- Controls user space access for reading perf event counters.
- * for arm64
- The default value is 0 (access disabled).
- When set to 1, user space can read performance monitor counter registers
- directly.
- See Documentation/arch/arm64/perf.rst for more information.
- * for riscv
- When set to 0, user space access is disabled.
- The default value is 1, user space can read performance monitor counter
- registers through perf, any direct access without perf intervention will trigger
- an illegal instruction.
- When set to 2, which enables legacy mode (user space has direct access to cycle
- and insret CSRs only). Note that this legacy value is deprecated and will be
- removed once all user space applications are fixed.
- Note that the time CSR is always directly accessible to all modes.
- pid_max
- =======
- PID allocation wrap value. When the kernel's next PID value
- reaches this value, it wraps back to a minimum PID value.
- PIDs of value ``pid_max`` or larger are not allocated.
- ns_last_pid
- ===========
- The last pid allocated in the current (the one task using this sysctl
- lives in) pid namespace. When selecting a pid for a next task on fork
- kernel tries to allocate a number starting from this one.
- powersave-nap (PPC only)
- ========================
- If set, Linux-PPC will use the 'nap' mode of powersaving,
- otherwise the 'doze' mode will be used.
- ==============================================================
- printk
- ======
- The four values in printk denote: ``console_loglevel``,
- ``default_message_loglevel``, ``minimum_console_loglevel`` and
- ``default_console_loglevel`` respectively.
- These values influence printk() behavior when printing or
- logging error messages. See '``man 2 syslog``' for more info on
- the different loglevels.
- ======================== =====================================
- console_loglevel messages with a higher priority than
- this will be printed to the console
- default_message_loglevel messages without an explicit priority
- will be printed with this priority
- minimum_console_loglevel minimum (highest) value to which
- console_loglevel can be set
- default_console_loglevel default value for console_loglevel
- ======================== =====================================
- printk_delay
- ============
- Delay each printk message in ``printk_delay`` milliseconds
- Value from 0 - 10000 is allowed.
- printk_ratelimit
- ================
- Some warning messages are rate limited. ``printk_ratelimit`` specifies
- the minimum length of time between these messages (in seconds).
- The default value is 5 seconds.
- A value of 0 will disable rate limiting.
- printk_ratelimit_burst
- ======================
- While long term we enforce one message per `printk_ratelimit`_
- seconds, we do allow a burst of messages to pass through.
- ``printk_ratelimit_burst`` specifies the number of messages we can
- send before ratelimiting kicks in. After `printk_ratelimit`_ seconds
- have elapsed, another burst of messages may be sent.
- The default value is 10 messages.
- printk_devkmsg
- ==============
- Control the logging to ``/dev/kmsg`` from userspace:
- ========= =============================================
- ratelimit default, ratelimited
- on unlimited logging to /dev/kmsg from userspace
- off logging to /dev/kmsg disabled
- ========= =============================================
- The kernel command line parameter ``printk.devkmsg=`` overrides this and is
- a one-time setting until next reboot: once set, it cannot be changed by
- this sysctl interface anymore.
- ==============================================================
- pty
- ===
- See Documentation/filesystems/devpts.rst.
- random
- ======
- This is a directory, with the following entries:
- * ``boot_id``: a UUID generated the first time this is retrieved, and
- unvarying after that;
- * ``uuid``: a UUID generated every time this is retrieved (this can
- thus be used to generate UUIDs at will);
- * ``entropy_avail``: the pool's entropy count, in bits;
- * ``poolsize``: the entropy pool size, in bits;
- * ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
- number of seconds between urandom pool reseeding). This file is
- writable for compatibility purposes, but writing to it has no effect
- on any RNG behavior;
- * ``write_wakeup_threshold``: when the entropy count drops below this
- (as a number of bits), processes waiting to write to ``/dev/random``
- are woken up. This file is writable for compatibility purposes, but
- writing to it has no effect on any RNG behavior.
- randomize_va_space
- ==================
- This option can be used to select the type of process address
- space randomization that is used in the system, for architectures
- that support this feature.
- == ===========================================================================
- 0 Turn the process address space randomization off. This is the
- default for architectures that do not support this feature anyways,
- and kernels that are booted with the "norandmaps" parameter.
- 1 Make the addresses of mmap base, stack and VDSO page randomized.
- This, among other things, implies that shared libraries will be
- loaded to random addresses. Also for PIE-linked binaries, the
- location of code start is randomized. This is the default if the
- ``CONFIG_COMPAT_BRK`` option is enabled.
- 2 Additionally enable heap randomization. This is the default if
- ``CONFIG_COMPAT_BRK`` is disabled.
- There are a few legacy applications out there (such as some ancient
- versions of libc.so.5 from 1996) that assume that brk area starts
- just after the end of the code+bss. These applications break when
- start of the brk area is randomized. There are however no known
- non-legacy applications that would be broken this way, so for most
- systems it is safe to choose full randomization.
- Systems with ancient and/or broken binaries should be configured
- with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
- address space randomization.
- == ===========================================================================
- reboot-cmd (SPARC only)
- =======================
- ??? This seems to be a way to give an argument to the Sparc
- ROM/Flash boot loader. Maybe to tell it what to do after
- rebooting. ???
- sched_energy_aware
- ==================
- Enables/disables Energy Aware Scheduling (EAS). EAS starts
- automatically on platforms where it can run (that is,
- platforms with asymmetric CPU topologies and having an Energy
- Model available). If your platform happens to meet the
- requirements for EAS but you do not want to use it, change
- this value to 0. On Non-EAS platforms, write operation fails and
- read doesn't return anything.
- task_delayacct
- ===============
- Enables/disables task delay accounting (see
- Documentation/accounting/delay-accounting.rst. Enabling this feature incurs
- a small amount of overhead in the scheduler but is useful for debugging
- and performance tuning. It is required by some tools such as iotop.
- sched_schedstats
- ================
- Enables/disables scheduler statistics. Enabling this feature
- incurs a small amount of overhead in the scheduler but is
- useful for debugging and performance tuning.
- sched_util_clamp_min
- ====================
- Max allowed *minimum* utilization.
- Default value is 1024, which is the maximum possible value.
- It means that any requested uclamp.min value cannot be greater than
- sched_util_clamp_min, i.e., it is restricted to the range
- [0:sched_util_clamp_min].
- sched_util_clamp_max
- ====================
- Max allowed *maximum* utilization.
- Default value is 1024, which is the maximum possible value.
- It means that any requested uclamp.max value cannot be greater than
- sched_util_clamp_max, i.e., it is restricted to the range
- [0:sched_util_clamp_max].
- sched_util_clamp_min_rt_default
- ===============================
- By default Linux is tuned for performance. Which means that RT tasks always run
- at the highest frequency and most capable (highest capacity) CPU (in
- heterogeneous systems).
- Uclamp achieves this by setting the requested uclamp.min of all RT tasks to
- 1024 by default, which effectively boosts the tasks to run at the highest
- frequency and biases them to run on the biggest CPU.
- This knob allows admins to change the default behavior when uclamp is being
- used. In battery powered devices particularly, running at the maximum
- capacity and frequency will increase energy consumption and shorten the battery
- life.
- This knob is only effective for RT tasks which the user hasn't modified their
- requested uclamp.min value via sched_setattr() syscall.
- This knob will not escape the range constraint imposed by sched_util_clamp_min
- defined above.
- For example if
- sched_util_clamp_min_rt_default = 800
- sched_util_clamp_min = 600
- Then the boost will be clamped to 600 because 800 is outside of the permissible
- range of [0:600]. This could happen for instance if a powersave mode will
- restrict all boosts temporarily by modifying sched_util_clamp_min. As soon as
- this restriction is lifted, the requested sched_util_clamp_min_rt_default
- will take effect.
- seccomp
- =======
- See Documentation/userspace-api/seccomp_filter.rst.
- sg-big-buff
- ===========
- This file shows the size of the generic SCSI (sg) buffer.
- You can't tune it just yet, but you could change it on
- compile time by editing ``include/scsi/sg.h`` and changing
- the value of ``SG_BIG_BUFF``.
- There shouldn't be any reason to change this value. If
- you can come up with one, you probably know what you
- are doing anyway :)
- shmall
- ======
- This parameter sets the total amount of shared memory pages that can be used
- inside ipc namespace. The shared memory pages counting occurs for each ipc
- namespace separately and is not inherited. Hence, ``shmall`` should always be at
- least ``ceil(shmmax/PAGE_SIZE)``.
- If you are not sure what the default ``PAGE_SIZE`` is on your Linux
- system, you can run the following command::
- # getconf PAGE_SIZE
- To reduce or disable the ability to allocate shared memory, you must create a
- new ipc namespace, set this parameter to the required value and prohibit the
- creation of a new ipc namespace in the current user namespace or cgroups can
- be used.
- shmmax
- ======
- This value can be used to query and set the run time limit
- on the maximum shared memory segment size that can be created.
- Shared memory segments up to 1Gb are now supported in the
- kernel. This value defaults to ``SHMMAX``.
- shmmni
- ======
- This value determines the maximum number of shared memory segments.
- 4096 by default (``SHMMNI``).
- shm_rmid_forced
- ===============
- Linux lets you set resource limits, including how much memory one
- process can consume, via ``setrlimit(2)``. Unfortunately, shared memory
- segments are allowed to exist without association with any process, and
- thus might not be counted against any resource limits. If enabled,
- shared memory segments are automatically destroyed when their attach
- count becomes zero after a detach or a process termination. It will
- also destroy segments that were created, but never attached to, on exit
- from the process. The only use left for ``IPC_RMID`` is to immediately
- destroy an unattached segment. Of course, this breaks the way things are
- defined, so some applications might stop working. Note that this
- feature will do you no good unless you also configure your resource
- limits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``). Most systems don't
- need this.
- Note that if you change this from 0 to 1, already created segments
- without users and with a dead originative process will be destroyed.
- sysctl_writes_strict
- ====================
- Control how file position affects the behavior of updating sysctl values
- via the ``/proc/sys`` interface:
- == ======================================================================
- -1 Legacy per-write sysctl value handling, with no printk warnings.
- Each write syscall must fully contain the sysctl value to be
- written, and multiple writes on the same sysctl file descriptor
- will rewrite the sysctl value, regardless of file position.
- 0 Same behavior as above, but warn about processes that perform writes
- to a sysctl file descriptor when the file position is not 0.
- 1 (default) Respect file position when writing sysctl strings. Multiple
- writes will append to the sysctl value buffer. Anything past the max
- length of the sysctl value buffer will be ignored. Writes to numeric
- sysctl entries must always be at file position 0 and the value must
- be fully contained in the buffer sent in the write syscall.
- == ======================================================================
- softlockup_all_cpu_backtrace
- ============================
- This value controls the soft lockup detector thread's behavior
- when a soft lockup condition is detected as to whether or not
- to gather further debug information. If enabled, each cpu will
- be issued an NMI and instructed to capture stack trace.
- This feature is only applicable for architectures which support
- NMI.
- = ============================================
- 0 Do nothing. This is the default behavior.
- 1 On detection capture more debug information.
- = ============================================
- softlockup_panic
- =================
- This parameter can be used to control whether the kernel panics
- when a soft lockup is detected.
- = ============================================
- 0 Don't panic on soft lockup.
- 1 Panic on soft lockup.
- = ============================================
- This can also be set using the softlockup_panic kernel parameter.
- soft_watchdog
- =============
- This parameter can be used to control the soft lockup detector.
- = =================================
- 0 Disable the soft lockup detector.
- 1 Enable the soft lockup detector.
- = =================================
- The soft lockup detector monitors CPUs for threads that are hogging the CPUs
- without rescheduling voluntarily, and thus prevent the 'migration/N' threads
- from running, causing the watchdog work fail to execute. The mechanism depends
- on the CPUs ability to respond to timer interrupts which are needed for the
- watchdog work to be queued by the watchdog timer function, otherwise the NMI
- watchdog — if enabled — can detect a hard lockup condition.
- split_lock_mitigate (x86 only)
- ==============================
- On x86, each "split lock" imposes a system-wide performance penalty. On larger
- systems, large numbers of split locks from unprivileged users can result in
- denials of service to well-behaved and potentially more important users.
- The kernel mitigates these bad users by detecting split locks and imposing
- penalties: forcing them to wait and only allowing one core to execute split
- locks at a time.
- These mitigations can make those bad applications unbearably slow. Setting
- split_lock_mitigate=0 may restore some application performance, but will also
- increase system exposure to denial of service attacks from split lock users.
- = ===================================================================
- 0 Disable the mitigation mode - just warns the split lock on kernel log
- and exposes the system to denials of service from the split lockers.
- 1 Enable the mitigation mode (this is the default) - penalizes the split
- lockers with intentional performance degradation.
- = ===================================================================
- stack_erasing
- =============
- This parameter can be used to control kernel stack erasing at the end
- of syscalls for kernels built with ``CONFIG_KSTACK_ERASE``.
- That erasing reduces the information which kernel stack leak bugs
- can reveal and blocks some uninitialized stack variable attacks.
- The tradeoff is the performance impact: on a single CPU system kernel
- compilation sees a 1% slowdown, other systems and workloads may vary.
- = ====================================================================
- 0 Kernel stack erasing is disabled, KSTACK_ERASE_METRICS are not updated.
- 1 Kernel stack erasing is enabled (default), it is performed before
- returning to the userspace at the end of syscalls.
- = ====================================================================
- stop-a (SPARC only)
- ===================
- Controls Stop-A:
- = ====================================
- 0 Stop-A has no effect.
- 1 Stop-A breaks to the PROM (default).
- = ====================================
- Stop-A is always enabled on a panic, so that the user can return to
- the boot PROM.
- sysrq
- =====
- See Documentation/admin-guide/sysrq.rst.
- tainted
- =======
- Non-zero if the kernel has been tainted. Numeric values, which can be
- ORed together. The letters are seen in "Tainted" line of Oops reports.
- ====== ===== ==============================================================
- 1 `(P)` proprietary module was loaded
- 2 `(F)` module was force loaded
- 4 `(S)` kernel running on an out of specification system
- 8 `(R)` module was force unloaded
- 16 `(M)` processor reported a Machine Check Exception (MCE)
- 32 `(B)` bad page referenced or some unexpected page flags
- 64 `(U)` taint requested by userspace application
- 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG
- 256 `(A)` an ACPI table was overridden by user
- 512 `(W)` kernel issued warning
- 1024 `(C)` staging driver was loaded
- 2048 `(I)` workaround for bug in platform firmware applied
- 4096 `(O)` externally-built ("out-of-tree") module was loaded
- 8192 `(E)` unsigned module was loaded
- 16384 `(L)` soft lockup occurred
- 32768 `(K)` kernel has been live patched
- 65536 `(X)` Auxiliary taint, defined and used by for distros
- 131072 `(T)` The kernel was built with the struct randomization plugin
- ====== ===== ==============================================================
- See Documentation/admin-guide/tainted-kernels.rst for more information.
- Note:
- writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
- booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
- and any of the ORed together values being written to ``tainted`` match with
- the bitmask declared on panic_on_taint.
- See Documentation/admin-guide/kernel-parameters.rst for more details on
- that particular kernel command line option and its optional
- ``nousertaint`` switch.
- threads-max
- ===========
- This value controls the maximum number of threads that can be created
- using ``fork()``.
- During initialization the kernel sets this value such that even if the
- maximum number of threads is created, the thread structures occupy only
- a part (1/8th) of the available RAM pages.
- The minimum value that can be written to ``threads-max`` is 1.
- The maximum value that can be written to ``threads-max`` is given by the
- constant ``FUTEX_TID_MASK`` (0x3fffffff).
- If a value outside of this range is written to ``threads-max`` an
- ``EINVAL`` error occurs.
- timer_migration
- ===============
- When set to a non-zero value, attempt to migrate timers away from idle cpus to
- allow them to remain in low power states longer.
- Default is set (1).
- traceoff_on_warning
- ===================
- When set, disables tracing (see Documentation/trace/ftrace.rst) when a
- ``WARN()`` is hit.
- tracepoint_printk
- =================
- When tracepoints are sent to printk() (enabled by the ``tp_printk``
- boot parameter), this entry provides runtime control::
- echo 0 > /proc/sys/kernel/tracepoint_printk
- will stop tracepoints from being sent to printk(), and::
- echo 1 > /proc/sys/kernel/tracepoint_printk
- will send them to printk() again.
- This only works if the kernel was booted with ``tp_printk`` enabled.
- See Documentation/admin-guide/kernel-parameters.rst and
- Documentation/trace/boottime-trace.rst.
- unaligned-trap
- ==============
- On architectures where unaligned accesses cause traps, and where this
- feature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
- ``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps
- are caught and emulated (instead of failing).
- = ========================================================
- 0 Do not emulate unaligned accesses.
- 1 Emulate unaligned accesses. This is the default setting.
- = ========================================================
- See also `ignore-unaligned-usertrap`_.
- unknown_nmi_panic
- =================
- The value in this file affects behavior of handling NMI. When the
- value is non-zero, unknown NMI is trapped and then panic occurs. At
- that time, kernel debugging information is displayed on console.
- NMI switch that most IA32 servers have fires unknown NMI up, for
- example. If a system hangs up, try pressing the NMI switch.
- unprivileged_bpf_disabled
- =========================
- Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
- once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF``
- will return ``-EPERM``. Once set to 1, this can't be cleared from the
- running kernel anymore.
- Writing 2 to this entry will also disable unprivileged calls to ``bpf()``,
- however, an admin can still change this setting later on, if needed, by
- writing 0 or 1 to this entry.
- If ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this
- entry will default to 2 instead of 0.
- = =============================================================
- 0 Unprivileged calls to ``bpf()`` are enabled
- 1 Unprivileged calls to ``bpf()`` are disabled without recovery
- 2 Unprivileged calls to ``bpf()`` are disabled
- = =============================================================
- warn_limit
- ==========
- Number of kernel warnings after which the kernel should panic when
- ``panic_on_warn`` is not set. Setting this to 0 disables checking
- the warning count. Setting this to 1 has the same effect as setting
- ``panic_on_warn=1``. The default value is 0.
- watchdog
- ========
- This parameter can be used to disable or enable the soft lockup detector
- *and* the NMI watchdog (i.e. the hard lockup detector) at the same time.
- = ==============================
- 0 Disable both lockup detectors.
- 1 Enable both lockup detectors.
- = ==============================
- The soft lockup detector and the NMI watchdog can also be disabled or
- enabled individually, using the ``soft_watchdog`` and ``nmi_watchdog``
- parameters.
- If the ``watchdog`` parameter is read, for example by executing::
- cat /proc/sys/kernel/watchdog
- the output of this command (0 or 1) shows the logical OR of
- ``soft_watchdog`` and ``nmi_watchdog``.
- watchdog_cpumask
- ================
- This value can be used to control on which cpus the watchdog may run.
- The default cpumask is all possible cores, but if ``NO_HZ_FULL`` is
- enabled in the kernel config, and cores are specified with the
- ``nohz_full=`` boot argument, those cores are excluded by default.
- Offline cores can be included in this mask, and if the core is later
- brought online, the watchdog will be started based on the mask value.
- Typically this value would only be touched in the ``nohz_full`` case
- to re-enable cores that by default were not running the watchdog,
- if a kernel lockup was suspected on those cores.
- The argument value is the standard cpulist format for cpumasks,
- so for example to enable the watchdog on cores 0, 2, 3, and 4 you
- might say::
- echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
- watchdog_thresh
- ===============
- This value can be used to control the frequency of hrtimer and NMI
- events and the soft and hard lockup thresholds. The default threshold
- is 10 seconds.
- The softlockup threshold is (``2 * watchdog_thresh``). Setting this
- tunable to zero will disable lockup detection altogether.
|