Linux menu

Saturday, September 20, 2014

Solaris 11 Disk I/O performance - Solaris Troubleshoot

Performance issues is the one of the major concern about every system admins. To identify the real time performance of the server which make every system admin under pressure. In this post we are going to discussing about one of the major concern about to identify the Disk I/O performance in . We have many utilities (like iostat, fsstat, sar , etc.,) in UNIX to identify the Disk I/O performance. 

Most of the time Application/DB team used to raise the concern about slowness of the filesystem. As a system admin we have to find the disk which is associated to the filesystem, then we can monitor the disk I/O statics on the particular disk in busy time. But most of the cases they will point the slowness of all the filesystem. let we  can start .....
There are many possibilities or causes for the Disk slowness. Might be one of the below
  • Disk usages
  • Hardware (disk errors)
  • More Application Utilization (More Disk I/O)
  • Mount options (Soft, Hard, etc.,)
  • Resources unavailability  (CPU, MEM.etc..,)
  • Disk Layout (Striping, Mirroring, whole disk space from on one single disk)
We are going to use iostat utility with below options to check the disk I/O performance

OptionsDescriptions
-cReport the percentage of time the system has spent in user mode, in system mode, waiting for I/O, and idling.
-CWhen the -x option is also selected, report extended disk statistics aggregated by controller id.
-dFor each disk, report the number of kilobytes transferred per second,
-DFor each disk, report the reads per second, writes per second, and percentage disk utilization.
-eDisplay device error summary statistics
-EDisplay all device error statistics
-nDisplay names in descriptive format. For example, cXtYdZ, rmt/N, server:/export/path.
-xReport extended disk statistics.
-zDo not print lines whose underlying data values are all zeros

Checking whether the disk is having any H/W or S/W errors by using iostat utilities, 
root@unixrock # iostat -en
  ---- errors ---
  s/w h/w trn tot device
    0   0   0   0 fd0
    0   0   0   0 md/d0
    0   0   0   0 md/d1
    0   0   0   0 md/d4
    0   0   0   0 md/d5
    0   0   0   0 md/d10
    0   0   0   0 md/d11
    0   0   0   0 md/d14
    0   0   0   0 md/d15
    0   0   0   0 md/d20
    0   0   0   0 md/d21
    0   0   0   0 md/d24
    0   0   0   0 md/d25
    0   0   0   0 c1t0d0
    0   0   0   0 c0t0d0
    0   0   0   0 c1t1d0
    0   0   0   0 c1t2d0
    0   0   0   0 c1t3d0
    0   0   0   0 c2t5d0
    0   0   0   0 c2t6d0
    0   0   0   0 c2t7d0
    0   0   0   0 c2t8d0
    0   0   0   0 unixrock:vold(pid576)
root@unixrock #
s/w   : Software errors
h/w   : hardware errors
trn   : Transport errors
Tot   : Total errors
Device  : Logical disks
The value which needs to be looked is H/W and S/W, If you find any H/W errors on our suspected disk, then keep monitor whether the errors are increasing more. if yes, then might be the chance for replacing the disks.
Checking the quick overview of disk I/O performance or disk bottle neck
root@unixrock # iostat -xnCz
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    0.0    0.0    0.0  0.0  0.0    0.3   10.5   0   0 c0
    0.0    0.0    0.0    0.0  0.0  0.0    0.3   10.5   0   0 c0t0d0
    0.1    0.1    1.3    0.6  0.0  0.0    0.0   36.0   0   0 c1
    0.0    0.1    0.6    0.3  0.0  0.0    0.0   33.9   0   0 c1t0d0
    0.0    0.0    0.6    0.3  0.0  0.0    0.0   43.2   0   0 c1t1d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.2    8.5   0   0 c1t2d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    2.7   0   0 c1t3d0
    0.1    0.0    1.1    0.1  0.0  0.0    0.6   39.1   0   0 md/d0
    0.0    0.0    0.0    0.0  0.0  0.0   15.7   47.5   0   0 md/d1
    0.0    0.0    0.1    0.1  0.0  0.0   14.5   56.4   0   0 md/d4
    0.0    0.0    0.5    0.1  0.0  0.0    0.0   32.1   0   0 md/d10
    0.0    0.0    0.0    0.0  0.0  0.0    0.0   40.1   0   0 md/d11
    0.0    0.0    0.1    0.1  0.0  0.0    0.0   49.1   0   0 md/d14
    0.0    0.0    0.5    0.1  0.0  0.0    0.0   34.2   0   0 md/d20
    0.0    0.0    0.0    0.0  0.0  0.0    0.0   54.1   0   0 md/d21
    0.0    0.0    0.1    0.1  0.0  0.0    0.0   50.9   0   0 md/d24
root@unixrock #
Options Descriptions
r/s reads per second
w/s writes per second
kr/s kilobytes read per second
kw/s kilobytes written per second
wait average number of transactions waiting for service (queue length)
actv average number of transactions actively being serviced
wsvc_t average  service  time  in  wait  queue,  in  milliseconds
asvc_t average service time of  active  transactions,  in  milliseconds
%w percent of time there are transactions waiting for service (queue non-empty)
%b percent of time the disk is busy (transactions  in  progress)
The value which needs to be consider is on above out is r/s, w/s, %b, asvc_t. If the r/s, w/s value is high along with %b with 5-7 %, the asvc_t is having more than 30-50 milliseconds, then we have to concentrate on below concerns 

If its NFS related disk, then we have to engage the NAS team to check the disk I/O from their end.
If its SAN related disk, then we have to engage the SAN team for further investigations.
If the Disk layout from one disk, we can recommend to spread the disk into multiple LUN layout for better performance. (example: 10Times X 100GB Luns will provide better performance than 1000GB single disk).
  
We can also use fsstat command to check the Filesystem performance outputs. "-F" will show all the filesystem status
root@unixrock # fsstat -F
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   set    ops   ops   ops bytes   ops bytes
1.59K   180   352 1.02M   833  6.24M  101K  751K  203M  344K 52.6M ufs
    0     0     0   100     0    118     0     7 17.5K     0     0 nfs
    0     0     0    20     0      0     0     0     0     0     0 zfs
    0     0     0    10     0      0     0     0     0     0     0 hsfs
    0     0     0 6.01K     0      0     0     0     0     0     0 lofs
5.63K 3.84K 1.45K 33.1K   101  15.0K    10 52.7K 53.5M 54.3K 47.5M tmpfs
    0     0     0   263     0      0     0    47 6.19K     0     0 mntfs
    0     0     0     0     0      0     0     0     0     0     0 nfs3
    0     0     0     0     0      0     0     0     0     0     0 nfs4
    0     0     0    38     0      0     0     0     0     0     0 autofs
root@unixrock #
we can also check the specific filesystem statistics
root@unixrock # fsstat -i ufs zfs nfs3
 read read  write write rddir rddir rwlock rwulock
  ops bytes   ops bytes   ops bytes    ops     ops
 752K  204M  345K 52.6M  101K 10.5M  1.17M   1.17M ufs
    0     0     0     0     0     0      0       0 zfs
    0     0     0     0     0     0      0       0 nfs3
root@unixrock #
We can also use "sar" utility to check the disk performances.
root@unixrock # sar -d 1
SunOS unixrock 5.10 Generic_142910-17 i86pc    03/20/2014
00:12:40   device        %busy   avque   r+w/s  blks/s  avwait  avserv
00:12:41   fd0               0     0.0       0       0     0.0     0.0
           iscsi_se          0     0.0       0       0     0.0     0.0
           md0               0     0.0       0       0     0.0     0.0
           md1               0     0.0       0       0     0.0     0.0
           md4               0     0.0       0       0     0.0     0.0
           md5               0     0.0       0       0     0.0     0.0
           md10              0     0.0       0       0     0.0     0.0
           md11              0     0.0       0       0     0.0     0.0
           md14              0     0.0       0       0     0.0     0.0
           md15              0     0.0       0       0     0.0     0.0
           md20              0     0.0       0       0     0.0     0.0
           md21              0     0.0       0       0     0.0     0.0
           md24              0     0.0       0       0     0.0     0.0
           md25              0     0.0       0       0     0.0     0.0
           nfs1              0     0.0       0       0     0.0     0.0
---Skipped-------
root@unixrock #
%busy, avque             : portion of time device was busy  servicing  a  transfer request
read/s, write/s, blks/s  : Number of read/write transfers  from  or to device
avwait                   : average wait time in milliseconds.
avserv                   : average service time in milliseconds 

No comments: