hung_task_timeout_secs( topic related to linux kernel hangup )

ABOUT hung_task_timeout_secs

if a task(process) is hung then hung_task_timeout_secs value decides if the hung task needs no reboot or reboot after n seconds

LINUX KERNEL RELATED PARAMETER
[bash light=”true”]
$cat /proc/sys/kernel/hung_task_timeout_secs
120
$

$echo 0 | sudo tee –append /proc/sys/kernel/hung_task_timeout_secs
0
$sudo cat /proc/sys/kernel/hung_task_timeout_secs
0
$
[/bash]

When a task in D state did not get scheduled for more than this value report a warning.
This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.

0: means infinite timeout - no checking done. Possible values to set are in range {0..LONG_MAX/HZ}.

PARAMETER RELATED
[bash light=”true”]
TEST-MAIL1 ~ #dmesg
[cut]
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rm D ffff88107f472c40 0 16705 22512 0x00000000
ffff881014693810 0000000000000086 ffff881000000000 ffff88102013b040
0000000000012c40 ffff880471855fd8 0000000000012c40 ffff880471854010
ffff880471855fd8 0000000000012c40 ffff881017ff8e40 0000000100000000
Call Trace:
[<ffffffff8148d45d>] ? schedule_timeout+0x1ed/0x2d0
[<ffffffffa0b7d1ea>] ? dlmlock+0x8a/0xda0 [ocfs2_dlm]
[<ffffffff8148ce5c>] ? wait_for_common+0x12c/0x1a0
[<ffffffff81052230>] ? try_to_wake_up+0x280/0x280
[<ffffffffa0a3b9c0>] ? __ocfs2_cluster_lock+0x1f0/0x780 [ocfs2]
[<ffffffff8148ce80>] ? wait_for_common+0x150/0x1a0
[<ffffffffa0a9c6bc>] ? ocfs2_buffer_cached+0x8c/0x180 [ocfs2]
[<ffffffffa0a40bc6>] ? ocfs2_inode_lock_full_nested+0x126/0x540 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5922e>] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2]
[<ffffffffa0a5ba1a>] ? ocfs2_prepare_orphan_dir+0x4a/0x290 [ocfs2]
[<ffffffffa0a5e621>] ? ocfs2_unlink+0x6e1/0xbb0 [ocfs2]
[<ffffffff811bcfea>] ? may_link+0xda/0x170
[<ffffffff81141c8e>] ? vfs_unlink+0x9e/0x100
[<ffffffff81145881>] ? do_unlinkat+0x1a1/0x1d0
[<ffffffff81147b00>] ? vfs_readdir+0xa0/0xe0
[<ffffffff8116fedb>] ? fsnotify_find_inode_mark+0x2b/0x40
[<ffffffff81170c24>] ? dnotify_flush+0x54/0x110
[<ffffffff81133eec>] ? filp_close+0x5c/0x90
[<ffffffff81496912>] ? system_call_fastpath+0x16/0x1b
[/bash]

CLASSROOM

While  waiting for  read()  or write()  to/from  a file  descriptor return, the process  will be put in a
special  kind of sleep, known as "D"  or "Disk Sleep". This  is special, because  the process can
not  be killed  or interrupted  while in  such a  state.  A process waiting for  a return from  ioctl()
would also  be put to  sleep in this manner.

RELATED SOURCE CODE EXPOSURE
[c]
/*
* Ok, the task did not get scheduled for more than 2 minutes,
* complain:
*/
if (sysctl_hung_task_warnings) {
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings–;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
t->comm, t->pid, timeout);
pr_err(" %s %s %.*s%s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
init_utsname()->version,
LINUX_PACKAGE_ID);
pr_err("\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
hung_task_show_lock = true;
}
[/c]

[c light=”true”]
/*
* Process updating of timeout sysctl
*/
int proc_dohung_task_timeout_secs(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos)
{
int ret;

ret = proc_doulongvec_minmax(table, write, buffer, lenp, ppos);

if (ret || !write)
goto out;

wake_up_process(watchdog_task);

out:
return ret;
}
[/c]
SOURCE CODE TAKEN FROM OFFICIAL LINUX KERNEL

RELATED FROM RESEARCH PAPER

Kernel  data collection  tools. Several  monitoring  facilities are provided by  the Linux  kernel,
which have  been exploited  in this work. In  particular, we use  KProbes which inserts  breakpoints
in arbitrary binary code locations in charge of triggering user-defined handler  functions. Handlers
can  be used  to collect  information about internal kernel  variables; subsequently, kernel execution
is restored. Kdump is a tool  for failure data collection based on the execution of  a secondary kernel,
namely capture kernel,  which is preliminarily  loaded  into  a  reserved memory  region.  When  the
primary kernel fails, the capture  kernel is executed; then, it can collect failure  data by reading
the main memory  state.  Built-in hang  detection mechanisms. Several  hang detection  mechanisms are
available in the Linux OS,  which can be enabled by recompiling the kernel.  In particular, the  following
facilities  can be  used for hang  detection: Soft  lockup detection,  i.e., the  kernel detects
whether a  "canary" task  is not scheduled  within a  timeout; Hard lockup detection, i.e.,  if any CPU in
the  system does not handles local    timer    interrupt   for    longer    than   a    timeout;
Sleep-inside-spinlock   checking,  i.e.,  assertions   that  verify whether there are spinlocks  that have 
been acquired before calling a  sleeping function  (i.e., a  function during  which  the current
thread may block and be preempted by the scheduler); Checks on lock API  usage, that  is: missing  lock 
initialization,  release  of an already freed lock, release of a  lock by a thread or CPU different
from the lock holder, lock data structure corruption.

source : http://tinyurl.com/7pt5j9a

Assessment and Improvement of Hang Detection in the Linux Operating System
2009 28th IEEE International Symposium on Reliable Distributed System

LINKS
https://access.redhat.com/solutions/60572
https://www.linuxquestions.org/questions/linux-software-2/kernel-panic-echo-0-proc-sys-kernel-4175629199/
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
https://stackoverflow.com/questions/84882/sudo-echo-something-etc-privilegedfile-doesnt-work-is-there-an-alterna
https://www.tldp.org/LDP/tlk/kernel/processes.html
https://www.nico.schottelius.org/blog/reboot-linux-if-task-blocked-for-more-than-n-seconds/
http://stackoverflow.com/questions/1475683/linux-process-states

skill, snice – send a signal or report process status

A UNIX Command
$skill
Usage:   skill [signal to send] [options] process selection criteria
Example: skill -KILL -v pts/*

The default signal is TERM. Use -l or -L to list available signals.
Particularly useful signals include HUP, INT, KILL, STOP, CONT, and 0.
Alternate signals may be specified in three ways: -SIGKILL -KILL -9

General options:
-f  fast mode            This is not currently useful.
-i  interactive use      You will be asked to approve each action.
-v  verbose output       Display information about selected processes.
-w  warnings enabled     This is not currently useful.
-n  no action            This only displays the process ID.

Selection criteria can be: terminal, user, pid, command.
The options below may be used to ensure correct interpretation.
-t  The next argument is a terminal (tty or pty).
-u  The next argument is a username.
-p  The next argument is a process ID number.
-c  The next argument is a command name.
$skill -l
HUP INT QUIT ILL TRAP ABRT BUS FPE KILL USR1 SEGV USR2 PIPE ALRM TERM STKFLT
CHLD CONT STOP TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH POLL PWR SYS
$ps aux | grep xine
jeffrin   2844  2.9  1.4 661236 29460 pts/3    Sl   20:44   0:00 xine
jeffrin   2866  0.0  0.0 112944   816 pts/1    S+   20:44   0:00 grep xine
$skill 2844
$ps aux | grep xine
jeffrin   2869  0.0  0.0 112944   820 pts/1    S+   20:44   0:00 grep xine
$

UNIX Explanation
The default  signal for  skill is TERM.  Use -l or  -L to
list  available  signals.   Particularly  useful  signals
include  HUP, INT,  KILL, STOP,  CONT, and  0.  Alternate
signals  may  be specified  in  three  ways: -9  -SIGKILL
-KILL.

proc asound devices 0.2

$cat /proc/asound/devices
  2:        : timer
  3:        : sequencer
  4: [ 0- 2]: digital audio capture
  5: [ 0- 1]: digital audio playback
  6: [ 0- 1]: digital audio capture
  7: [ 0- 0]: digital audio playback
  8: [ 0- 0]: digital audio capture
  9: [ 0- 0]: hardware dependent
 10: [ 0]   : control
$


devices
        Lists the ALSA native device mappings.
A music  sequencer is  a musical application  or a device  designed to
play  back musical  notation. The  original kind  of sequencer  is now
known  as a step  sequencer to  distinguish it  from the  modern kind,
which records a musician playing notes.

GNUsound - A sound editor  for Linux/x86. It supports multiple tracks,
multiple  outputs, and  8, 16,  or 24/32  bit samples.  It can  read a
number of audio formats through libaudiofile, and saves them as WAV.
GNU sound

Digital audio uses pulse-code modulation and digital signals for sound
reproduction.   This  includes  analog-to-digital   conversion  (ADC),
digital-to-analog  conversion  (DAC),  storage, and  transmission.  In
effect,  the system  commonly  referred to  as  digital is  in fact  a
discrete-time,   discrete-level  analog   of  a   previous  electrical
analog. While modern systems can be quite subtle in their methods, the
primary  usefulness of  a  digital  system is  the  ability to  store,
retrieve and transmit signals without any loss of quality.
Intel's  use  of  the  phrase  audio codec  refers  to  signals  being
encoded/decoded  to/from  analog  audio  from/to digital  audio,  thus
actually a combined audio AD/DA-converter. This should not be confused
with a  codec in  the sense  of converting from  one binary  format to
another,  such as  an audio  (MP3) or  video (Xvid)  codec in  a media
player.

A timer is a specialized type of clock. A timer can be used to control
the  sequence of  an  event  or process.  Whereas  a stopwatch  counts
upwards from zero for measuring elapsed time, a timer counts down from
a  specified   time  interval,  like  an  hourglass.   Timers  can  be
mechanical, electromechanical,  electronic (quartz), or  even software
as  all  modern  computers  include  digital timers  of  one  kind  or
another. When  the set period  expires some timers simply  indicate so
(e.g.,  by  an  audible   signal),  while  others  operate  electrical
switches, such as a time switch, which cuts electrical power.

proc filesystem with alsa.



$cat /proc/asound/cards
0 [NVidia ]: HDA-Intel - HDA NVidia
HDA NVidia at 0xf5000000 irq 22
$

The HD-audio component consists of  two parts: the controller chip and
the codec chips on the HD-audio Linux provides a single driver for all
controllers, snd-hda-intel.  Although the  driver name contains a word
of a well-known  hardware vendor, it's not specific to  it but for all
controller chips  by other companies.  Since  the HD-audio controllers
are supposed  to be compatible, the single  snd-hda-driver should work
in most cases.



Reference/Source:
Linux kernel documentation 2.6.32 related.

acpi with proc

AC Adapter in Proc Filesystem

Alternating current  (AC) adapters  are used to  power or  charge many
common electronic devices, such as mobile phones, laptop computers, or
external  hard  drives.  An  AC  adapter  changes  AC  power  from  an
electrical outlet into the type of power or voltage that an electronic
device needs to work. Typically,  each device has a designated adapter
that is  pre-set to  the proper voltage  conversion. For  this reason,
among others, AC adapters generally are not interchangeable.


$cat /proc/acpi/ac_adapter/ACAD/state
state: on-line
$
ACAD is the name of the adapter.

when RTO retransmissions remain unacknowledged…

$cat /proc/sys/net/ipv4/tcp_orphan_retries
0
$


tcp_orphan_retries - INTEGER
This value influences the timeout of a locally closed TCP connection,
        when RTO retransmissions remain unacknowledged. See
 tcp_retries2 for more details.


The default value is 7

If your machine is a loaded WEB server, you should think about
 lowering this value, such sockets may consume significant resources.
 Cf. tcp_max_orphans.


source :
Linux Kernel Documentation . 2.6.32
net/ipv4/tcp_timer.c - 39 identical
    99: static int tcp_orphan_retries(struct sock *sk, int alive)
   100: {
   101:         int retries = sysctl_tcp_orphan_retries; /* May be zero. */
   157:                         retry_until = tcp_orphan_retries(sk, alive);
   158:
android.git.kernel.org/kernel/msm.git - GPL - C - More from msm.git »


shaper.queues
   176:   echo "Set number of orphant retries to 5"
   177:   echo 5 > /proc/sys/net/ipv4/tcp_orphan_retries
   178:
www.chronox.de/tc+filter/shaper-0.2.tar.bz2 - Unknown - Shell - More from shaper-0.2.tar.bz2 »

usr/share/man/man7/tcp.7
   282: .TP
   283: .B tcp_orphan_retries
   284: The maximum number of attempts made to probe the other
www2.cddc.vt.edu/linux/distributions/7linux/7v6/7base/7v6a11.tar.bz2 - Unknown - Troff -


3.3.15. tcp_orphan_retries
The tcp_orphan_retries variable tells the TCP/IP stack how many times
 to retry to kill connections on the other side before killing it on our
 own side. If your machine runs as a highly loaded http server it may
 be worth thinking about lowering this value. http sockets will consume
 large amounts of resources if not checked.


This variable takes an integer value. The default value for this variable
 is 7, which would approximately correspond to 50 seconds through 16
 minutes depending on the Retransmission Timeout (RTO). For a
 complete explanation of the RTO, read the "3.7. Data Communication"
 section in RFC 793 - Transmission Control Protocol.


source :
Ipsysctl tutorial 1.0.4
Oskar Andreasson
blueflux@koffein.net
Copyright © 2002 by Oskar Andreasson