Linux menu

Wednesday, September 17, 2014

Processes Monitoring in Linux With Examples

Understanding Processes

Processes, Light-weight Processes, Threads and Tasks

Let us understand the concepts of processes, threads and tasks in Linux.
A process is an instance of a program in execution. A process is composed of several user threads (or simply threads), each of which represents an execution flow of the process. Nowadays, most multithreaded applications are written using standard sets of library functions called pthread (POSIX thread) libraries. pthreads run in user space, They are merely an abstraction for the programmer. However simply using pthreads does not provide the true benefits of multi-threading since all the threads share a single execution context and if one of the threads is blocked on a particular system call like read, the whole process will be blocked because kernel is oblivious to the threads.
Linux therefore provides lightweight processes to offer better support for multithreaded applications. Basically, two lightweight processes may share some resources, like the address space, the open files, and so on. Whenever one of them modifies a shared resource, the other immediately sees the change. However each lightweight process has an independent execution context and is treated as an independent process by the Kernel. For a deeper understanding lets take a look at the way a new process is created in Linux. One can use either the fork() or the clone() command to create a new process. A fork() always creates a completely independent process which does not share the address space of the parent process (though a forked process does start with a pointer to the same address space and a copy-on-write model is used to optimize space utilization). A clone() on the other hand allows granular control over process creation and one can specify whether the child process should share the address space, open files, signals etc with the parent. A process created using clone() which shares these attributes with its parents is known as a light-weight process. In effect therefore in Linux everything is a process which either shares the resources of its parent OR does not. In fact fork() is implemented as a wrapper over clone() by setting all flags to share nothing between the parent and child processes.
A straightforward way to implement multithreaded applications is to associate a lightweight process with each thread created in the pthreads library. In this way, the threads can access the same set of application data structures by simply sharing the same memory address space, the same set of open files, and so on; at the same time, each thread can be scheduled independently by the kernel so that one may sleep while another remains runnable. There is an advantage of this implementation. To the kernel everything is seen as processes and the scheduler does not distinguish between threads and processes. LWP and threads are both used interchangeably to describe an LWP, since typically any application runtime that supports the creation of threads, always creates an LWP underlying the thread though it can be otherwise. In this document therefore we use a thread, a task and an LWP to mean the same.
Each process (including a LWP) has its own pid. However POSIX standards state that all threads of a multi-threaded application must have the same PID. Linux overcomes this by making use of thread groups. Each thread belongs to a group and the PID of the first thread (also known as the group leader) created in a thread group is stored in a field called tgid (thread-group id). Linux always returns this field as the pid of a thread, as opposed to the actual pid of the thread. Incase of a process with a single thread the threadgroupid and the pid are the same. Check the below examples
[user@server ~]$ cat /proc/10200/status
Name: postgres
State: S (sleeping)
SleepAVG: 98%
Tgid: 10200
Pid: 10200
PPid: 14860
[user@server ~]$ cat /proc/14860/status
Name: postgres
State: S (sleeping)
SleepAVG: 98%
Tgid: 14860
Pid: 14860
PPid: 1
The gettid() call in Linux returns the actual pid of a LWP if the LWP is part of a thread group and is not the group leader. One can find out the actual pids and statuses of the threads within a process using the /proc filesystem as follows -
[user@server ~]$ cat /proc/23638/status
Name: mysqld
State: S (sleeping)
SleepAVG: 98%
Tgid: 23638
Pid: 23638
PPid: 2561
[user@server ~]$ cat /proc/23638/task/14514/status
Name: mysqld
State: S (sleeping)
SleepAVG: 98%
Tgid: 23638
Pid: 14514
PPid: 2561
Note in the above example the first process represents the mysqld process, while the second process represents a thread or a LWP within the first mysqld process. Every thread that is a part of the mysqld process (pid: 23638) would contain a folder structure within /task/
One can use "ps H -Le" to display all threads as if they were processes. Alternatively one can use top and toggle display of threads using the interactive "i" switch (this does not show the actual thread id). Infact when working with processes, the commands you will generally use are ps, top, htop (a better version of top), and the /proc filesystem
User-space concurrency model:
An excellent example of a concurrency model executed in the user space is the Scala actor model. Actors in scala represent a concurrency abstraction for a user. However each actor does not map to a thread. Instead actors are executed on a thread pool. Ideally, the size of the thread pool corresponds to the number of processor cores of the machine. The thread pool grows if all the worker threads are blocked but there are still remaining tasks to be processed. Erlang has a similar user-space concurrency model.

Process States

A process (which includes a thread) on a Linux machine can be in any of the following states -
  • TASK_RUNNING - The process is either executing on a CPU or waiting to be executed.
  • TASK_INTERRUPTIBLE - The process is suspended (sleeping) until some condition becomes true. Raising a hardware interrupt, releasing a system resource the process is waiting for, or delivering a signal are examples of conditions that might wake up the process (put its state back to TASK_RUNNING). Typically blocking IO calls (disk/network) will result in the task being marked as TASK_INTERRUPTIBLE. As soon as the data it is waiting on is ready to be read an interrupt is raised by the device and the interrupt handler changes the state of the task to TASK_INTERRUPTIBLE. Also processes in idle mode (ie not performing any task) should be in this state.
  • TASK_UNINTERRUPTIBLE - Like TASK_INTERRUPTIBLE, except that delivering a signal to the sleeping process leaves its state unchanged. This process state is seldom used. It is valuable, however, under certain specific conditions in which a process must wait until a given event occurs without being interrupted. Ideally not too many tasks will be in this state.
    • For instance, this state may be used when a process opens a device file and the corresponding device driver starts probing for a corresponding hardware device. The device driver must not be interrupted until the probing is complete, or the hardware device could be left in an unpredictable state.
    • Atomic write operations may require a task to be marked as UNINTERRUPTIBLE
    • NFS access sometimes results in access processes being marked as UNINTERRUPTIBLE
    • reads/writes from/to disk can be marked thus for a fraction of a second
    • I/O following a page fault marks a process UNINTERRUPTIBLE
    • I/O to the same disk that is being accessed for page faults can result in a process marked as UNINTERRUPTIBLE
    • Programmers may markl a task as UNINTERRUPTIBLE instead of using INTERRUPTIBLE
  • TASK_STOPPED - Process execution has been stopped; the process enters this state after receiving a SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU signal
  • TASK_TRACED - Process execution has been stopped by a debugger
  • EXIT_ZOMBIE - Process execution is terminated, but the parent process has not yet issued a wait4( ) or waitpid( ) system call. The OS will not clear zombie processes until the parent issues a wait()-like call
  • EXIT_DEAD - The final state: the process is being removed by the system because the parent process has just issued a wait4( ) or waitpid( ) system call for it. Changing its state from EXIT_ZOMBIE to EXIT_DEAD avoids race conditions due to other threads of execution that execute wait( )-like calls on the same process.
Only processes that are in the TASK_RUNNING are candidates for using a free cpu. If no task is in RUNNING state then the cpu will remain idle. All tasks in the RUNNING state compete for CPU time (alongwith kernel tasks). The kernel scheduler determines based on task priority as to which task should be given a slice of the cpu time and for what duration.
Additionally processes are organized into sets of sessions. The session's ID is the same as the pid of the process that created the session. That process is known as the session leader for that session group. All of that process's descendants are then members of that session unless they specifically remove themselves from it.

Understanding load average

Load average refers to the average number of processes (including threads) that have been waiting in a certain time period. While conventionally this accounts for processes in TASK_RUNNING state that are waiting for cpu, in Linux this also takes into account processes marked as uninterruptible sleep. Therefore the average number of processes waiting in either TASK_RUNNING and/or TASK_UNINTERRUPTIBLE for a period of time signifies load average. This value is computed using an exponential decay formula. Ideally this number signifies processes starving for CPU (or possibly even the disk incase the disk IO processes are in an UNINTERRUPTIBLE state). If there are no processes marked as UNINTERRUPTIBLE, the load average count should not be much higher than the count of cpu cores in your machine. A higher load average signifies that there are processes waiting for cpu.

Understanding Process Priorities

Each process has a process priority which is a number between 100 (highest priority) to 139 (lowest priority). The time quantum each process gets from the scheduler is dependent on its priority. As an eg a priority value of 100 will give a time quantum of 800ms to a process while a value of 139 will result in a time quantum of 5ms. While a process may start out with a static priority the kernel computes a dynamic priority for each process based on its average sleep time. The average sleep time also determine whether a process should be treated as interactive or batch and the scheduling of a process changes based on this determination.

Monitoring processes

Using top to check process states

The top command shows tasks currently running -
[user@server ~]$ top
top - 03:37:45 up 5 days, 7:57, 12 users, load average: 7.24, 5.68, 5.09
Tasks: 471 total, 15 running, 456 sleeping, 0 stopped, 0 zombie
Cpu(s): 39.2%us, 8.8%sy, 9.1%ni, 38.8%id, 1.5%wa, 0.0%hi, 2.6%si, 0.0%st
Mem: 132093140k total, 131496368k used, 596772k free, 380832k buffers
Swap: 2096472k total, 492k used, 2095980k free, 126816660k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12733 root 39 19 4724 1316 380 R 72.9 0.0 134:23.01 lzop
6899 postgres 16 0 3602m 1.8g 1.8g R 22.1 1.4 0:15.55 postgres
6857 postgres 15 0 3603m 386m 379m R 17.8 0.3 0:07.49 postgres
6884 postgres 15 0 3602m 139m 134m R 12.9 0.1 0:05.85 postgres
11878 postgres 15 0 3620m 2.0g 2.0g R 12.2 1.6 24:49.97 postgres
6853 postgres 16 0 3602m 1.8g 1.8g R 11.9 1.4 0:15.17 postgres
14862 postgres 15 0 71596 912 544 R 10.9 0.0 552:45.73 postgres
7413 postgres 15 0 3600m 127m 122m R 10.2 0.1 0:04.79 postgres
25177 postgres 16 0 3632m 2.1g 2.0g R 9.9 1.6 57:45.83 postgres
9068 postgres 16 0 3602m 129m 124m R 8.9 0.1 0:05.17 postgres
9073 postgres 16 0 3600m 138m 133m R 7.3 0.1 0:05.35 postgres
6854 postgres 15 0 3600m 123m 118m D 6.3 0.1 0:05.18 postgres
9072 postgres 15 0 3602m 123m 118m R 5.9 0.1 0:05.51 postgres
6855 postgres 15 0 3602m 1.8g 1.8g R 4.9 1.4 0:13.39 postgres
9036 dushyant 15 0 13000 1388 816 R 1.0 0.0 0:00.33 top
24 root 34 19 0 0 0 R 0.0 0.0 0:50.72 ksoftirqd/7
As you can see in the above list, out of 471 tasks, 15 are running and 456 are sleeping. This means that 15 tasks are in the TASK_RUNNING state and 456 in the TASK_INTERRUPTIBLE/UNINTERRUPTIBLE state. Note the term "RUNNING" is a slight misnomer inasmuch as the above snapshot was taken on a machine with 8 cores, hence at any point in time only 8 of the 15 will get CPU attention while the remaining 7 will be waiting in the run queue. Tasks that are sleeping do not consume any CPU cycles. You can toggle "top" to only show tasks in TASK_RUNNING state by using the "i" toggle switch. Each task row shows the state of the task in the "S" column as one of 'D' = uninterruptible sleep 'R' = running 'S' = sleeping (interruptible) 'T' = traced or stopped 'Z' = zombie

Using ps to check process states

  • ps -eN r - show all tasks except running tasks
  • ps -e r - show running tasks only
The state of the process in ps is displayed using the following flags
  • D - Uninterruptible sleep
  • R - Running or runnable (on run queue)
  • S - Interruptible sleep (waiting for an event to complete)
  • T - Stopped, either by a job control signal or because it is being traced
  • X - dead (should never be seen)
  • Z - Defunct ("zombie") process, terminated but not reaped by its parent.
  • < - high-priority (not nice to other users)
  • N - low-priority (nice to other users)
  • L - has pages locked into memory (for real-time and custom IO)
  • s - is a session leader
  • l - is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
  • + - is in the foreground process group
Notice in the below output the "l" flag is set against the process state of mysqld signifying that it is multi-threaded
[user@server ~]$ ps -e r -N | grep "mysql"
2561 ? S 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/mysql/sessions.myorderbox.com.pid
13970 pts/6 S+ 0:00 grep mysql
23638 ? Sl 1055:34 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --log-error=/var/lib/mysql/sessions.myorderbox.com.err --pid-file=/var/lib/mysql/sessions.myorderbox.com.pid --socket=/var/lib/mysql/mysql.sock --port=3306

/proc/<pid>/wchan

[user@server ~]$ cat /proc/14860/wchan
_stext
The wchan field in the /proc/<pid> folder gives the kernel function on which the process is waiting. However wchan is broken on x86 systems where the SCHED_NO_NO_OMIT_FRAME_POINTER has been set to "y" (which is the default value). In those systems the wchan value within /proc/<pid>/stat will always return "0" which maps to _stext. Refer to http://lkml.org/lkml/2008/11/6/12 and http://lwn.net/Articles/292178/

/proc/status

$ cat /proc/$$/status
Name: bash
State: S (sleeping)
Tgid: 3515
Pid: 3515
PPid: 3452
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 100 100 100 100
FDSize: 256
Groups: 16 33 100
VmPeak: 9136 kB
VmSize: 7896 kB
VmLck: 0 kB
VmHWM: 7572 kB
VmRSS: 6316 kB
VmData: 5224 kB
VmStk: 88 kB
VmExe: 572 kB
VmLib: 1708 kB
VmPTE: 20 kB
Threads: 1
SigQ: 0/3067
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000010000
SigIgn: 0000000000384004
SigCgt: 000000004b813efb
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: 00000001
Cpus_allowed_list: 0
Mems_allowed: 1
Mems_allowed_list: 0
voluntary_ctxt_switches: 150
nonvoluntary_ctxt_switches: 545
The fields are as follows:
  • Name: Command run by this process.
  • State: Current state of the process. One of "R (running)", "S (sleeping)", "D (disk sleep)", "T (stopped)", "T (tracing stop)", "Z (zombie)", or "X (dead)".
  • Tgid: Thread group ID (i.e., Process ID).
  • Pid: Thread ID (see gettid(2)).
  • TracerPid: PID of process tracing this process (0 if not being traced).
  • Uid, Gid: Real, effective, saved set, and file system UIDs (GIDs).
  • FDSize: Number of file descriptor slots currently allocated.
  • Groups: Supplementary group list.
  • VmPeak: Peak virtual memory size.
  • VmSize: Virtual memory size.
  • VmLck: Locked memory size (see mlock(3)).
  • VmHWM: Peak resident set size ("high water mark").
  • VmRSS: Resident set size.
  • VmData, VmStk, VmExe: Size of data, stack, and text segments.
  • VmLib: Shared library code size.
  • VmPTE: Page table entries size (since Linux 2.6.10).
  • Threads: Number of threads in process containing this thread.
  • SigPnd, ShdPnd: Number of signals pending for thread and for process as a whole (see pthreads(7) and signal(7)).
  • SigBlk, SigIgn, SigCgt: Masks indicating signals being blocked, ignored, and caught (see signal(7)).
  • CapInh, CapPrm, CapEff: Masks of capabilities enabled in inheritable, permitted, and effective sets (see capabilities(7)).
  • CapBnd: Capability Bounding set (since kernel 2.6.26, see capabilities(7)).
  • Cpus_allowed: Mask of CPUs on which this process may run (since Linux 2.6.24, see cpuset(7)).
  • Cpus_allowed_list: Same as previous, but in "list format" (since Linux 2.6.26, see cpuset(7)).
  • Mems_allowed: Mask of memory nodes allowed to this process (since Linux 2.6.24, see cpuset(7)).
  • Mems_allowed_list: Same as previous, but in "list format" (since Linux 2.6.26, see cpuset(7)).
  • voluntary_context_switches, nonvoluntary_context_switches: Number of voluntary and involuntary context switches (since Linux 2.6.23).

/proc/<pid>/stat

Status information about the process. This is used by ps(1). It is defined in /usr/src/linux/fs/proc/array.c.
[user@server ~]$ cat /proc/7278/stat
7278 (postgres) S 1 7257 7257 0 -1 4202496 36060376 10845160168 0 749 20435 137212 158536835 39143290 15 0 1 0 50528579 3763298304 20289 18446744073709551615 4194304 7336916 140734091375136 18446744073709551615 225773929891 0 0 19935232 84487 0 0 0 17 2 0 0 12
The fields, in order, are:
  • pid: The process ID.
  • comm: The filename of the executable, in parentheses. This is visible whether or not the executable is swapped out.
  • state: One character from the string "RSDZTW" where R is running, S is sleeping in an interruptible wait, D is waiting in uninterruptible disk sleep, Z is zombie, T is traced or stopped (on a signal), and W is paging.
  • ppid: The PID of the parent.
  • pgrp: The process group ID of the process.
  • session: The session ID of the process.
  • tty_nr: The controlling terminal of the process. (The minor device number is contained in the combination of bits 31 to 20 and 7 to 0; the major device number is in bits 15 to 8.)
  • tpgid: The ID of the foreground process group of the controlling terminal of the process.
  • flags: The kernel flags word of the process. For bit meanings, see the PF_* defines in <linux/sched.h>. Details depend on the kernel version.
  • minflt: The number of minor faults the process has made which have not required loading a memory page from disk.
  • cminflt: The number of minor faults that the process's waited-for children have made.
  • majflt: The number of major faults the process has made which have required loading a memory page from disk.
  • cmajflt: The number of major faults that the process's waited-for children have made.
  • utime: Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes guest time, guest_time (time spent running a virtual CPU, see below), so that applications that are not aware of the guest time field do not lose that time from their calculations.
  • stime: Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
  • cutime: Amount of time that this process's waited-for children have been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK). (See also times(2).) This includes guest time, cguest_time (time spent running a virtual CPU, see below).
  • cstime: Amount of time that this process's waited-for children have been scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
  • priority: (Explanation for Linux 2.6) For processes running a real-time scheduling policy (policy below; see sched_setscheduler(2)), this is the negated scheduling priority, minus one; that is, a number in the range -2 to -100, corresponding to real-time priorities 1 to 99. For processes running under a non-real-time scheduling policy, this is the raw nice value (setpriority(2)) as represented in the kernel. The kernel stores nice values as numbers in the range 0 (high) to 39 (low), corresponding to the user-visible nice range of -20 to 19. Before Linux 2.6, this was a scaled value based on the scheduler weighting given to this process.
  • nice: The nice value (see setpriority(2)), a value in the range 19 (low priority) to -20 (high priority).
  • num_threads: Number of threads in this process (since Linux 2.6). Before kernel 2.6, this field was hard coded to 0 as a placeholder for an earlier removed field.
  • itrealvalue: The time in jiffies before the next SIGALRM is sent to the process due to an interval timer. Since kernel 2.6.17, this field is no longer maintained, and is hard coded as 0.
  • starttime: The time in jiffies the process started after system boot.
  • vsize: Virtual memory size in bytes.
  • rss: Resident Set Size: number of pages the process has in real memory. This is just the pages which count toward text, data, or stack space. This does not include pages which have not been demand-loaded in, or which are swapped out.
  • rsslim: Current soft limit in bytes on the rss of the process; see the description of RLIMIT_RSS in getpriority(2).
  • startcode: The address above which program text can run.
  • endcode: The address below which program text can run.
  • startstack: The address of the start (i.e., bottom) of the stack.
  • kstkesp: The current value of ESP (stack pointer), as found in the kernel stack page for the process.
  • kstkeip: The current EIP (instruction pointer).
  • signal: The bitmap of pending signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
  • blocked: The bitmap of blocked signals, displayed as a decimalnumber. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
  • sigignore: The bitmap of ignored signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
  • sigcatch: The bitmap of caught signals, displayed as a decimal number. Obsolete, because it does not provide information on real-time signals; use /proc/[pid]/status instead.
  • wchan: This is the "channel" in which the process is waiting. It is the address of a system call, and can be looked up in a namelist if you need a textual name. (If you have an up-to-date /etc/psdatabase, then try ps -l to see the WCHAN field in action.)
  • nswap: Number of pages swapped (not maintained).
  • cnswap: Cumulative nswap for child processes (not maintained).
  • exit_signal: (since Linux 2.1.22) Signal to be sent to parent when we die.
  • processor: (since Linux 2.2.8) CPU number last executed on.
  • rt_priority: (since Linux 2.5.19; was: before Linux 2.6.22) Real-time scheduling priority, a number in the range 1 to 99 for processes scheduled under a real-time policy, or 0, for non-real-time processes (see sched_setscheduler(2)).
  • policy: (since Linux 2.5.19; was: before Linux 2.6.22) Scheduling policy (see sched_setscheduler(2)). Decode using the SCHED_* constants in linux/sched.h.
  • delayacct_blkio_ticks: (since Linux 2.6.18) Aggregated block I/O delays, measured in clock ticks (centiseconds).
  • guest_time: (since Linux 2.6.24) Guest time of the process (time spent running a virtual CPU for a guest operating system), measured in clock ticks (divide by sysconf(_SC_CLK_TCK).
  • cguest_time:ld (since Linux 2.6.24) Guest time of the process's children, measured in clock ticks (divide by sysconf(_SC_CLK_TCK).

vmstat

procs ----------
memory--------- --
swap- ----
io--- -
system- ----
cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
3 1 696 1169444 442588 5495700 0 0 75 103 5 2 4 2 91 3 0
2 0 696 1125064 442788 5511968 0 0 4392 3036 3365 2532 7 6 79 8 0
1 2 696 1121304 442900 5515692 0 0 2656 420 2585 2754 3 6 85 6 0
4 3 696 1081844 443292 5533832 0 0 2036 10042 4874 4655 13 8 69 10 0
  • r: The number of processes waiting for run time
  • b: The number of processes in uninterruptible sleep

Examples

To see every process on the system using standard syntax:
ps -e
ps -ef
ps -eF
ps -ely
To see every process on the system using BSD syntax:
ps ax
ps axu
To print a process tree:
ps -ejH
ps axjf
To get info about threads:
ps -eLf
ps axms
To get security info:
ps -eo euser,ruser,suser,fuser,f,comm,label
ps axZ
ps -eM
To see every process running as root (real & effective ID) in user format:
ps -U root -u root u
To see every process with a user-defined format:
ps -eo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:14,comm
ps axo stat,euid,ruid,tty,tpgid,sess,pgrp,ppid,pid,pcpu,comm
ps -eopid,tt,user,fname,tmout,f,wchan
Print only the process IDs of syslogd:
ps -C syslogd -o pid=
Print only the name of PID 42:
ps -p 42 -o comm=

Simple Process Selection

-ASelect all processes. Identical to -e.
-N
Select all processes except those that fulfill the specified conditions. (negates the selection) Identical to --deselect.
T
Select all processes associated with this terminal. Identical to the t option without any argument.
-a
Select all processes except both session leaders (see getsid(2)) and processes not associated with a terminal.
a
Lift the BSD-style "only yourself" restriction, which is imposed upon the set of all processes when some BSD-style (without "-") options are used or when the ps personality setting is BSD-like. The set of processes selected in this manner is in addition to the set of processes selected by other means. An alternate description is that this option causes ps to list all processes with a terminal (tty), or to list all processes when used together with the x option.
-d
Select all processes except session leaders.
-e
Select all processes. Identical to -A.
g
Really all, even session leaders. This flag is obsolete and may be discontinued in a future release. It is normally implied by the a flag, and is only useful when operating in the sunos4 personality.
r
Restrict the selection to only running processes.
x
Lift the BSD-style "must have a tty" restriction, which is imposed upon the set of all processes when some BSD-style (without "-") options are used or when the ps personality setting is BSD-like. The set of processes selected in this manner is in addition to the set of processes selected by other means. An alternate description is that this option causes ps to list all processes owned by you (same EUID as ps), or to list all processes when used together with the a option.
--deselect
Select all processes except those that fulfill the specified conditions. (negates the selection) Identical to -N.

Process Selection By List

These options accept a single argument in the form of a blank-separated or comma-separated list. They can be used multiple times. For example: ps -p "1 2" -p 3,4
-C cmdlistSelect by command name.
This selects the processes whose executable name is given in cmdlist.
-G grplistSelect by real group ID (RGID) or name.
This selects the processes whose real group name or ID is in the grplist list. The real group ID identifies the group of the user who created the process, see getgid(2).
U userlistSelect by effective user ID (EUID) or name.
This selects the processes whose effective user name or ID is in userlist. The effective user ID describes the user whose file access permissions are used by the process (see geteuid(2)). Identical to -u and --user.
-U userlistselect by real user ID (RUID) or name.
It selects the processes whose real user name or ID is in the userlist list. The real user ID identifies the user who created the process, seegetuid(2).
-g grplistSelect by session OR by effective group name.
Selection by session is specified by many standards, but selection by effective group is the logical behavior that several other operating systems use. This ps will select by session when the list is completely numeric (as sessions are). Group ID numbers will work only when some group names are also specified. See the -s and --group options.
p pidlistSelect by process ID. Identical to -p and --pid.
-p pidlist
Select by PID.
This selects the processes whose process ID numbers appear in pidlist. Identical to p and --pid.
-s sesslistSelect by session ID.
This selects the processes with a session ID specified in sesslist.
t ttylistSelect by tty. Nearly identical to -t and --tty, but can also be used with an empty ttylist to indicate the terminal associated with ps. Using the Toption is considered cleaner than using T with an empty ttylist.
-t ttylist
Select by tty.
This selects the processes associated with the terminals given in ttylist. Terminals (ttys, or screens for text output) can be specified in several forms: /dev/ttyS1, ttyS1, S1. A plain "-" may be used to select processes not attached to any terminal.
-u userlistSelect by effective user ID (EUID) or name.
This selects the processes whose effective user name or ID is in userlist. The effective user ID describes the user whose file access permissions are used by the process (see geteuid(2)). Identical to U and --user.
--Group grplistSelect by real group ID (RGID) or name. Identical to -G.
--User userlist
Select by real user ID (RUID) or name. Identical to -U.
--group grplist
Select by effective group ID (EGID) or name.
This selects the processes whose effective group name or ID is in grouplist. The effective group ID describes the group whose file access permissions are used by the process (see geteuid(2)). The -g option is often an alternative to --group.
--pid pidlistSelect by process ID. Identical to -p and p.
--ppid pidlist
Select by parent process ID. This selects the processes with a parent process ID in pidlist. That is, it selects processes that are children of those listed in pidlist.
--sid sesslist
Select by session ID. Identical to -s.
--tty ttylist
Select by terminal. Identical to -t and t.
--user userlist
Select by effective user ID (EUID) or name. Identical to -u and U.
-123
Identical to --sid 123.
123
Identical to --pid 123.

Output Format Control

These options are used to choose the information displayed by ps. The output may differ by personality.
-Fextra full format. See the -f option, which -F implies.
-O format
is like -o, but preloaded with some default columns. Identical to -o pid,format,state,tname,time,command or -o pid,format,tname,time,cmd, see -o below.
O format
is preloaded o (overloaded).
The BSD O option can act like -O (user-defined output format with some common fields predefined) or can be used to specify sort order. Heuristics are used to determine the behavior of this option. To ensure that the desired behavior is obtained (sorting or formatting), specify the option in some other way (e.g. with -O or --sort). When used as a formatting option, it is identical to -O, with the BSD personality.
-MAdd a column of security data. Identical to Z. (for SE Linux)
X
Register format.
Z
Add a column of security data. Identical to -M. (for SE Linux)
-c
Show different scheduler information for the -l option.
-f
does full-format listing. This option can be combined with many other UNIX-style options to add additional columns. It also causes the command arguments to be printed. When used with -L, the NLWP (number of threads) and LWP (thread ID) columns will be added. See the c option, the format keyword args, and the format keyword comm.
j
BSD job control format.
-j
jobs format
l
display BSD long format.
-l
long format. The -y option is often useful with this.
o format
specify user-defined format. Identical to -o and --format.
-o format
user-defined format.
format is a single argument in the form of a blank-separated or comma-separated list, which offers a way to specify individual output columns. The recognized keywords are described in the STANDARD FORMAT SPECIFIERS section below. Headers may be renamed (ps -o pid,ruser=RealUser -o comm=Command) as desired. If all column headers are empty (ps -o pid= -o comm=) then the header line will not be output. Column width will increase as needed for wide headers; this may be used to widen up columns such as WCHAN (ps -o pid,wchan=WIDE-WCHAN-COLUMN -o comm). Explicit width control (ps opid,wchan:42,cmd) is offered too. The behavior of ps -o pid=X,comm=Y varies with personality; output may be one column named "X,comm=Y" or two columns named "X" and "Y". Use multiple -ooptions when in doubt. Use the PS_FORMAT environment variable to specify a default as desired; DefSysV and DefBSD are macros that may be used to choose the default UNIX or BSD columns.
sdisplay signal format
u
display user-oriented format
v
display virtual memory format
-y
Do not show flags; show rss in place of addr. This option can only be used with -l.
-Z
display security context format (SELinux, etc.)
--format format
user-defined format. Identical to -o and o.
--context
Display security context format. (for SE Linux)

Output Modifiers

-Hshow process hierarchy (forest)
N namelist
Specify namelist file. Identical to -n, see -n above.
O order
Sorting order. (overloaded)
The BSD O option can act like -O (user-defined output format with some common fields predefined) or can be used to specify sort order. Heuristics are used to determine the behavior of this option. To ensure that the desired behavior is obtained (sorting or formatting), specify the option in some other way (e.g. with -O or --sort).
For sorting, obsolete BSD O option syntax is O[+|-]k1[,[+|-]k2[,...]]. It orders the processes listing according to the multilevel sort specified by the sequence of one-letter short keys k1k2, ... described in the OBSOLETE SORT KEYS section below. The "+" is currently optional, merely re-iterating the default direction on a key, but may help to distinguish an O sort from an O format. The "-" reverses direction only on the key it precedes.
SSum up some information, such as CPU usage, from dead child processes into their parent. This is useful for examining a system where a parent process repeatedly forks off short-lived children to do work.
c
Show the true command name. This is derived from the name of the executable file, rather than from the argv value. Command arguments and any modifications to them (see setproctitle(3)) are thus not shown. This option effectively turns the args format keyword into the comm format keyword; it is useful with the -f format option and with the various BSD-style format options, which all normally display the command arguments. See the -f option, the format keyword args, and the format keyword comm.
e
Show the environment after the command.
f
ASCII-art process hierarchy (forest)
h
No header. (or, one header per screen in the BSD personality)
The h option is problematic. Standard BSD ps uses this option to print a header on each page of output, but older Linux ps uses this option to totally disable the header. This version of ps follows the Linux usage of not printing the header unless the BSD personality has been selected, in which case it prints a header on each page of output. Regardless of the current personality, you can use the long options --headers and --no-headers to enable printing headers each page or disable headers entirely, respectively.
k specspecify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]] Choose a multi-letter key from the STANDARD FORMAT SPECIFIERS section. The "+" is optional since default direction is increasing numerical or lexicographic order. Identical to --sort. Examples:
ps jaxkuid,-ppid,+pid
ps axk comm o comm,args
ps kstart_time -ef
-n namelistset namelist file. Identical to N.
The namelist file is needed for a proper WCHAN display, and must match the current Linux kernel exactly for correct output. Without this option, the default search path for the namelist is:
$PS_SYSMAP$PS_SYSTEM_MAP
/proc/*/wchan
/boot/System.map-`uname -r`
/boot/System.map
/lib/modules/`uname -r`/System.map
/usr/src/linux/System.map
/System.map
nNumeric output for WCHAN and USER. (including all types of UID and GID)
-w
Wide output. Use this option twice for unlimited width.
w
Wide output. Use this option twice for unlimited width.
--cols n
set screen width
--columns n
set screen width
--cumulative
include some dead child process data (as a sum with the parent)
--forest
ASCII art process tree
--headers
repeat header lines, one per page of output
--no-headers
print no header line at all
--lines n
set screen height
--rows n
set screen height
--sort spec
specify sorting order. Sorting syntax is [+|-]key[,[+|-]key[,...]] Choose a multi-letter key from the STANDARD FORMAT SPECIFIERS section. The "+" is optional since default direction is increasing numerical or lexicographic order. Identical to k. For example: ps jax --sort=uid,-ppid,+pid
--width n
set screen width

Thread Display

HShow threads as if they were processes
-L
Show threads, possibly with LWP and NLWP columns
-T
Show threads, possibly with SPID column
m
Show threads after processes
-m
Show threads after processes

Other Information

LList all format specifiers.
-V
Print the procps version.
V
Print the procps version.
--help
Print a help message.
--info
Print debugging info.
--version
Print the procps version.

Notes

This ps works by reading the virtual files in /proc. This ps does not need to be setuid kmem or have any privileges to run. Do not give this ps any special permissions.
This ps needs access to namelist data for proper WCHAN display. For kernels prior to 2.6, the System.map file must be installed.
CPU usage is currently expressed as the percentage of time spent running during the entire lifetime of a process. This is not ideal, and it does not conform to the standards that ps otherwise conforms to. CPU usage is unlikely to add up to exactly 100%.
The SIZE and RSS fields don't count some parts of a process including the page tables, kernel stack, struct thread_info, and struct task_struct. This is usually at least 20 KiB of memory that is always resident. SIZE is the virtual size of the process (code+data+stack).
Processes marked <defunct> are dead processes (so-called "zombies") that remain because their parent has not destroyed them properly. These processes will be destroyed by init(8) if the parent process exits.
If the length of the username is greater than the length of the display column, the numeric user ID is displayed instead.

Process Flags

The sum of these values is displayed in the "F" column, which is provided by the flags output specifier.
1forked but didn't exec
4
used super-user privileges

Process State Codes

Here are the different values that the sstat and state output specifiers (header "STAT" or "S") will display to describe the state of a process.
DUninterruptible sleep (usually IO)
R
Running or runnable (on run queue)
S
Interruptible sleep (waiting for an event to complete)
T
Stopped, either by a job control signal or because it is being traced.
W
paging (not valid since the 2.6.xx kernel)
X
dead (should never be seen)
Z
Defunct ("zombie") process, terminated but not reaped by its parent.
For BSD formats and when the stat keyword is used, additional characters may be displayed:
<high-priority (not nice to other users)
N
low-priority (nice to other users)
L
has pages locked into memory (for real-time and custom IO)
s
is a session leader
l
is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
+
is in the foreground process group

More Examples


1. List Currently Running Processes (ps -ef, ps -aux)
Its a commonly used example with a ps command to list down all the process which are currently running in a machine. The following example shows the options of ps command to get all the processes.
$ ps -ef
root     26551     5  0 Feb10 ?        00:03:41 [pdflush]
root     26570     5  0 Feb10 ?        00:00:20 [pdflush]
root     30344  3382  0 Feb21 ?        00:00:11 sshd: root@pts/14
root     30365 30344  0 Feb21 pts/14   00:00:02 -bash
root     30393  3382  0 Feb21 ?        00:00:10 sshd: root@pts/15
Where:
  • -e to display all the processes.
  • -f to display full format listing.
In case of BSD machines, you can use ‘ps -aux’ will give the details about all the process as shown above.
$ ps -aux

2. List the Process based on the UID and Commands (ps -u, ps -C)

Use -u option to displays the process that belongs to a specific username. When you have multiple username, separate them using a comma. The example below displays all the process that are owned by user wwwrun, or postfix.
$ ps -f -u wwwrun,postfix
UID        PID  PPID  C STIME TTY          TIME CMD
postfix   7457  7435  0 Mar09 ?        00:00:00 qmgr -l -t fifo -u
wwwrun    7495  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun    7496  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun    7497  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun    7498  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun    7499  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun   10078  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
wwwrun   10082  7491  0 Mar09 ?        00:00:00 /usr/sbin/httpd2-prefork -f /etc/apache2/httpd.conf
postfix  15677  7435  0 22:23 ?        00:00:00 pickup -l -t fifo -u
Often ps is used with grep like “ps -aux | grep command” to get the list of process with the given command.
But ps command itself has an option to achieve the same. The following example shows that all the processes which has tatad.pl in its command execution.
$ ps -f -C tatad.pl
UID        PID  PPID  C STIME TTY          TIME CMD
root      9576     1  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9577  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9579  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9580  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9581  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9582  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root     12133  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
Note: We can create aliases for ps command to list processes based on commands, users or groups.

3. List the processes based on PIDs or PPIDs (ps -p, ps –ppid)

Each process will be assigned with the unique Process ID (PID).
When you launch some application, it might fork number of processes and each sub process will have its own PID. So, each process will have its own process id and parent processid.
For all the processes that a process forks will have the same PPID (parent process identifier). The following method is used to get a list of processes with a particular PPID.
$ ps -f --ppid 9576
UID        PID  PPID  C STIME TTY          TIME CMD
root      9577  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9579  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9580  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9581  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root      9582  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
root     12133  9576  0 Mar09 ?        00:00:00 /opt/tata/perl/bin/perl /opt/tata/bin/tatad.pl
The following example is to list the processes which has given PID.
$ ps -f  -p 25009,7258,2426
UID        PID  PPID  C STIME TTY          TIME CMD
root      2426     4  0 Mar09 ?        00:00:00 [reiserfs/0]
root      7258     1  0 Mar09 ?        00:00:00 /usr/sbin/nscd
postfix  25009  7435  0 00:02 ?        00:00:00 pickup -l -t fifo -u

4. List Processes in a Hierarchy (ps –forest)

The example below display the process Id and commands in a hierarchy. –forest is an argument to ps command which displays ASCII art of process tree. From this tree, we can identify which is the parent process and the child processes it forked in a recursive manner.
$ ps -e -o pid,args --forest
  468  \_ sshd: root@pts/7
  514  |   \_ -bash
17484  \_ sshd: root@pts/11
17513  |   \_ -bash
24004  |       \_ vi ./790310__11117/journal
15513  \_ sshd: root@pts/1
15522  |   \_ -bash
 4280  \_ sshd: root@pts/5
 4302  |   \_ -bash
Note: You can also use tree and pstree command to displays process in a nice tree structure.

5. List elapsed wall time for processes (ps -o pid,etime=)

If you want the get the elapsed time for the processes which are currently running ps command provides etime which provides the elapsed time since the process was started, in the form [[dd-]hh:]mm:ss.
The below command displays the elapsed time for the process IDs 1 (init) and process id 29675.
For example “10-22:13:29″ in the output represents the process init is running for 10days, 22hours,13 minutes and 29seconds. Since init process starts during the system startup, this time will be same as the output of the ‘uptime’ command.
# ps -p 1,29675 -o pid,etime=
  PID
    1 10-22:13:29
29675  1-02:58:46

6. List all threads for a particular process (ps -L)

You can get a list of threads for the processes. When a process hangs, we might need to identify the list of threads running for a particular process as shown below.
 $ ps -C java -L -o pid,tid,pcpu,state,nlwp,args
  PID   TID %CPU S NLWP COMMAND
16992 16992  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16993  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16994  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16995  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16996  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16997  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16998  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 16999  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17000  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17001  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17002  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17003  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 17024  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 15753  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
16992 15754  0.0 S   15 ../jre/bin/java -Djava.ext.dirs=../jre/lib/ext:../lib:../auto_lib -Xdebug -Xnoagent -Djava.compiler=NONE -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5006
-L option is used to display the list of threads for a process which has the command given. And it also displays nlwp, which represents number of light weight processes. In the above example, a total of 15 java threads are running.

7. Finding memory Leak (ps –sort pmem)

A memory leak, technically, is an ever-increasing usage of memory by an application.
With common desktop applications, this may go unnoticed, because a process typically frees any memory it has used when you close the application.
However, In the client/server model, memory leakage is a serious issue, because applications are expected to be available 24×7. Applications must not continue to increase their memory usage indefinitely, because this can cause serious issues. To monitor such memory leaks, we can use the following commands.
$ ps aux --sort pmem

USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  1520  508 ?        S     2005   1:27 init
inst  1309  0.0  0.4 344308 33048 ?      S     2005   1:55 agnt (idle)
inst  2919  0.0  0.4 345580 37368 ?      S     2005  20:02 agnt (idle)
inst 24594  0.0  0.4 345068 36960 ?      S     2005  15:45 agnt (idle)
root 27645  0.0 14.4 1231288 1183976 ?   S     2005   3:01 /TaskServer/bin/./wrapper-linux-x86-32
In the above ps command, –sort option outputs the highest %MEM at bottom. Just note down the PID for the highest %MEM usage. Then use ps command to view all the details about this process id, and monitor the change over time. You had to manually repeat ir or put it as a cron to a file.
$ ps ev --pid=27645
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32

$ ps ev --pid=27645
PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
27645 ? S 3:01 0 25 1231262 1183976 14.4 /TaskServer/bin/./wrapper-linux-x86-32
Note: In the above output, if RSS (resident set size, in KB) increases over time (so would %MEM), it may indicate a memory leak in the application.

No comments: