Table of contents
- Overview...
- LICENSE
- Installation
- INSTALL.sh
- Starting Web and Database servers
- Migration from any old dim_STAT version to the new one
- First-Level Security
- STAT-service
- Main Page
- Preferences
- Start On-Line Collecting
- EasySTAT
- BatchLOAD
- Analyzing
- Multi-Host Analyzing
- Select Multi-host
- Choose Collect(s) and Time interval
- Choose STATs
- Go!
- Result with Static Log
- Result with Dynamic Log
- Single-Host Analyzing
- Choose Collect and STAT
- Example IOSTAT: Choose Disks criteria
- Example IOSTAT: Choose STAT Variables
- Example IOSTAT: Result Graph
- Save Graph as Bookmark...
- Bookmarks
- Choose Collect and click on Bookmarks...
- Choose Time interval and Graphics style
- Select all Data you want to see and GO!
- Result Page
- Administration actions
- Multi-Host Extended Analyze
- dim_STAT CLI
- Administration
- Active/Stopped Collect
- Delete/Recycle Collects
- Export/Import collects
- Modify Collect parameters
- LOG Messages operations
- Add-On Statistics
- Example of SINGLE-Line command integration
- MULTI-Line Add-On command integration
- REAL LIFE EXAMPLE...
- Pre-Integrated Add-Ons
- Administation tasks
- Linux Special Notes
- Linux STAT-service
- Lvmstat
- Lmpstat
- LcpuSTAT (deprecated)
- LioSTAT
- psSTAT for Linux
- LpsSTAT (psSTAT)
- LPrcLOAD (ProcLOAD)
- LUsrLOAD (UserLOAD)
- LnetLOAD (netLOAD)
- Report Tool
- Overview
- Datatype: Text, HTML, Image, Binary
- Datatype: SysINFO
- Datatype: HTML.tar.Z
- Datatype: dim_STAT-Snapshot
- Datatype: dim_STAT-Collect
- Preview / Generate / Publish
- Export / Import
- Let's try! New Report
- Click on Report Tool
- New Report
- Edit Report
- Edit Actions
- Edit Note
- Edit Note, continue...
- Edit Note, continue2...
- Edit Note, continue3...
- Edit Report, continue...
- Edit Report, continue2...
- Add Note
- New Note -- SysINFO
- New Note -- SysINFO Form
- New Note -- SysINFO Result
- New Note -- SysINFO Link Contents
- Edit Report, continue3...
- Edit Report, continue4...
- Add New Note -- Image
- Add New Note -- Image Inline
- Add New Note -- Image Linked
- Add New Note -- dim_STAT Collect, Step1
- Add New Note -- dim_STAT Collect, Step2
- Add New Note -- dim_STAT Collect, Step3
- Add New Note -- dim_STAT Collect, Step3 continue
- Add New Note -- dim_STAT Collect, Step4
- Add New Note -- dim_STAT Collect, Step5
- Add New Note -- dim_STAT Collect Result
- Add New Note -- dim_STAT Collect Contents, ordered by:Collect
- Add New Note -- dim_STAT Collect Result per STATs
- Add New Note -- dim_STAT Collect Contents, ordered by:STATS
- Edit Report, next...
- Edit Report -- Cut
- Edit Report -- Paste!
- Edit Report -- Pasted...
- Edit Report -- Preview
- Edit Report -- Preview Output
- Edit Report -- Preview Output2
- Generate Report
- Generated Report documents
- Report Tool Home
- Additional Tools
- FAQ
- Sizing of dim_STAT Instance...
- I've started my collects but it seems that nothing gets collected?
- Syntax of text matching pattern
- When will you upgrade to the newer MySQL version?
- With multiple hosts to monitor, is it possible to graph them together?..
- How easy is it to integrate any new stats to monitor, including DTrace stuff?
- Could I get the raw data via dim_STAT-CLI instead of the graphs?...
- I have a Windows machine to monitor remote UNIX boxes.... Any help?..
- Full Working cycle Example
Overview... |
dim_STAT is a tool for both high-level and detailed, monitoring and performance analysis of Solaris, Linux, and other UNIX systems.
The main features of dim_STAT are:
All STAT data is collected from standard UNIX tools like vmstat, iostat, etc. (or some special ones, like psSTAT for monitoring users and processes activity) and saved in the MySQL database. Collected data is accessed via a web interface and can be presented in several manners (interactive or static graphs, text, HTML tables). Since v.8.1 there is also a way to collect data from other UNIX systems (HP/UX, AIX, MacOSX, etc.)
- A web based user interface
- All collected data is saved in a database
- Multiple data views
- Interactive (Java) or static graphs (PNG)
- Real Time monitoring
- Multi-Host monitoring
- Post analyzing
- Statistics integration (Add-On)
- Professional reporting with automated features
- One-click STAT-Bookmarks
- etc.
dim_STAT can be used for the on-line monitoring of one or several hosts at the same time. As well, data can be post loaded from output files of stat commands and analyzed in the same manner. At any time data collection from new stat commands can be added to the tool (via Add-On interface) to enlarge your view on application workloads, RDBMS, your personal STAT program, etc.
By default, dim_STAT interfaces with the following Solaris stats (SPARC and x86):
as well as the following Add-On extensions for both Solaris SPARC/x86 and/or Linux/x86:
- vmstat
- mpstat
- iostat
- netstat
- psSTAT, ProcLOAD, UserLOAD (processes an users)
- ZoneLOAD, PoolLOAD, ProjLOAD, TaskLOAD (CPU/memory/etc. load per zone/pool/project/task (Solaris 10))
- netLOAD (extended network stats)
- UDPstats (UDP traffic)
- IOpatt (Solaris 10 I/O pattern via DTrace)
- vxstat (VxVM stats)
The CPU utilization of dim_STAT during collect is very low and even less than standard tools like top or perfbar.
- CoreSTAT (Solaris)
- MEMSTAT (Solaris)
- HAR v2 (Solaris CPU chip counters for SPARC and x64)
- jvmSTAT (Java VM GC Activity and Memory Usage stats)
- oraEXEC, oraIO, oraSLEEP, oraENQ, oraASMIO (Oracle activity stats)
- mysqlSTAT, mysqlLOAD, innodbSTAT, innodbMUTEX, innodbMETRICS (MySQL & InnoDB activity stats)
- pgsqlSTAT, pgsqlLOAD (PostgreSQL activity stats)
- LvmSTAT (Linux vmstat)
- LcpuSTAT (Linux mpstat)
- Lmpstat (Linux mpstat v2)
- LioSTAT (Linux iostat)
- LnetLOAD (Linux netLOAD)
- LpsSTAT (Linux psSTAT)
- LprcLOAD (Linux ProcLOAD)
- LusrLOAD (Linux UserLOAD)
- IObench (tool for I/O stress load)
- dbSTRESS (tool for database stress load)
- OSXiostat, OSXvmstat, OSXnetstat (experimental MacOSX support was added since v.9.0)
- and mostly any other program you want to add...
General View |
Just to get an idea how dim_STAT works.
Each machine you want to monitor in real-time should run a special STAT-service daemon (client). Via the web browser you start collectors to communicate with clients. All information collected gets saved in a database and may be analyzed as soon as the data is arriving or lateron. In general, all analysis, reporting or administration is done from the web browser. The web interface is developed and runs on WebX (my own tool) ...
LICENSE |
Since v.8.3 dim_STAT is moving to GPLv2 license!
But all old stuff which I have only as binary or other binaries shipped without sources will stay under freeware license.
Installation |
The dim_STAT installation package is either delivered as a TAR archive (dim_STAT.tar) or, when on CDs, already "untarred".
Before install: Verify your available diskspace - you will need ~60MB for the initial install, mostly to store Web Server and Database Server data. The database volume will grow according to the number of (future) STAT collections and the web directory may grow with your reports. So reserve enough space for your data ...
During installation: a new user "dim" and a group "dim" will be created. User "dim" is the owner of the dim_STAT database and the web server. In case your system has special rules or restrictions, you may create these manually beforehand, or you may choose other user and group names that are following your system policies. Please, after installation, don't forget to set a password to this user! (otherwise cron is not allowing execution of regular clean-up tasks via 'crontab')...
- WebX home (default: /opt/WebX)
- Data home (default: /apps)
- Temporary space (default: /tmp)
- -HOST `hostname`
- -IP ip_address
- -USER dim
- -GROUP dim
- -WebX_DIR /opt/WebX
- -TEMP_DIR /tmp
- -HOME_DIR /apps
- -HTTP_PORT 80
- -DB_PORT 3306
- -STAT_PORT 5000
- -USERADD yes (add user/group )
- -AutoLink yes (make auto-start links in /etc/rc*.d)
- First of all you stop all collects on this database (and check via 'Preferences' there is no connections anymore to this database)..
- Wait 15 minutes (MySQL will flush data and close files)
- Start "repair" MySQL command:
# cd /apps/mysql/data/Demo # /apps/mysql/bin/myisamchk -r *.MYI
- Restart all your collects you previously stopped
- Recovery process blocked all users from using database during whole recovery time..
- It's extremely difficult to say which table/database will need or not need a data recovery (even if it was closed properly it doesn't mean yet indexes were not corrupted - during system crash filesystem buffers may still stay dirty and not flushed to disk(s))..
- Finally the only running "myisamchk -r" gives you a true repair in this case and it may take a lot of time.
- Every 5 minutes mysql daemon is forced to flush key buffers and close all table files - it's protecting at least non-active databases, their data normally will still stay stable in case of system crash!
- If system crash happens, MySQL server will still start correctly but with a warning message - probably some of the databases will need a data repair!..
- If you discover your database is broken:
- stop all active collects on it
- wait 5 minutes (within 5 minutes all your tables will be closed)
- start recovery on your database (see above) - This solution is give a way to recover databases in preferred by user order, as well leave other working (if they don't need to repair) or just create a new database and still continue your work!
- Stop all collects in your database
- Wait 15 minutes
- Backup the database (ex. "Demo"):
# cd /apps/mysql/data /your_backup # cp -rp Demo /your_backup_path OR: # tar cf - Demo | gzip > /your_backup_path/Demo.tgz
- Restart all previously stopped collects...
- NOTE: since v.8.3 there is a web interface added to safely backup whole database.
- Check there is no more connections to your database
- Delete database files (ex. "Demo"):
# rm -rf /apps/mysql/data/Demo
- Stop dim_STAT server
- Start only MySQL instance
- Connect to your database
- Execute CHECK, then REPAIR of your TABLE
- Stop MySQL instance
- Start dim_STAT server
- Create a new Database
- Convert existing Database to another Storage Engine
- Backup a whole Database
- Export STAT Collect(s)
- Import STAT Collect(s)
- Recycle STAT Collect(s)
- Stop all activity on your current dim_STAT installation
- dim_STAT-Server stop
- Backup all your databases from '/apps/mysql/data/' (see below) except: dim_00, mysql and dim
- mysql: system database, don't play with it !!
- dim_00: is a reference database and changing with every release
- dim: is a "Default" database, and if you really need it, rename it before backup - Install the new dim_STAT distribution
- Restore your backup-ed data into '/apps/mysql/data'
- Start dim_STAT-Server
- 1) via /apps/httpd/bin/htaccess create /apps/httpd/etc/.htpasswd file and add any pairs of user/password you need
- 2) create ".htaccess" file with context:
AuthName "Welcome to dim_STAT Host" AuthType Basic AuthUserFile /apps/httpd/etc/.htpasswd require valid-user
- 3) copy ".htaccess" file into /apps/httpd/home/docs and /apps/httpd/home/cgi-bin
- 4) try to connect to your web server now and check the access user/password - that's all! ;-)
INSTALL.sh |
As the root user, unload the tar archive into some directory and start the installation script:
# cd /tmp # tar xvf /path_to_tar/dim_STAT.tar # cd dim_STAT-INSTALL # # INSTALL.sh
During installation you will be asked to confirm your host IP address (found automatically), host and domain name, the script verifies if the user "dim" already exists on the system, if not it will be created, and you will be asked about WebX and home directories (Web Server, Database Server, Administration and Client scripts, etc.) and about port numbers to be used.
Mainly you have to choose 3 application directories:
And a user/group name which will be the owner of the dim_STAT data in your system (default: 'dim')
If you are not sure about the meaning of some values, leave them by default.
NOTE: WebX is the main interpreter (or execution engine), it interprets all application script files and absolutely needs a fixed and trusted root (home) directory. Otherwise, anyone may execute whatever they want on your machine (like /etc/passwd to crack logins, etc.). So, as a first step protection for its root directory: you may choose one of 4 available paths (hey, 4 choices anyway, better then one :) ). Also, the WebX engine itself is very small (only a few MB) and not growing.
After install, the dim_STAT software will be distributed on your system in the following way:
NOTE: To simplify things, the next examples assume that your home directory is '/apps' and owner's user name is 'dim'.+ /WebX, /apps/WebX, /opt/WebX or /etc/WebX - WebX main directory (only 4 possibilities) | + /apps - default dim_STAT home directory | +-- /ADMIN - administration scripts (start/stop dim_STAT Server, BatchLOAD, etc.) | +-- /mysql - MySQL database server main directory | +-- /httpd - Apache Web server directory | +-- /client - client collect script(s) | +-- /Java2GIF - Java applet graph to GIF convertor | +-- /htmldoc - HTML to PDF converting tool | +-- ... - there may be other directories depending on dim_STAT release :))
Silent INSTALL |
Since version 8.1 there is a silent "auto install" feature integrated in the install script. It may be very useful in case you need to automate the installation of dim_STAT on your servers. To activate it, use the '-Auto yes' option.
Then add more options if you need to have any settings different from the default:
Examples :
Default install: # ./INSTALL.sh -Auto yes With customized Home: # ./INSTALL.sh -Auto yes -HOME_DIR /export/home/apps/dim_STAT With existing User: # ./INSTALL.sh -Auto yes -USER stat -GROUP staff -ADDUSER no -HOME_DIR /staff/stat etc...
Starting Web and Database servers |
As you saw before, administration scripts are placed in /apps/ADMIN :
# cd /apps/ADMIN # dim_STAT-Server start
To stop servers:
# cd /apps/ADMIN # dim_STAT-Server stop
NOTE: a global dim_STAT-Server script is working as the main admin interface and replaces various separate httpd / mysql scripts. This global script also checks before a stop/start action if there are any active collects running and restarts them automatically during the next startup. Also, if the shutdown was not properly done, startup script will print a warning messages about a possible need of index rebuild on some databases...
At any moment you may look in the database for any active connections.
$ su - root # /apps/mysql/bin/mysql -S /apps/mysql/data/mysql.sock mysql> mysql> show processlist; +------+------+-----------+----------+---------+-------+-------+------------------+ | Id | User | Host | db | Command | Time | State | Info | +------+------+-----------+----------+---------+-------+-------+------------------+ | 3 | dim | localhost | Mind | Sleep | 18 | NULL | NULL | | 4 | dim | localhost | Mind | Sleep | 17 | NULL | NULL | | 5 | dim | localhost | Mind | Sleep | 2 | NULL | NULL | | 6 | dim | localhost | Mind | Sleep | 1 | NULL | NULL | | 7 | dim | localhost | Mind | Sleep | 2 | NULL | NULL | | 8 | dim | localhost | Mind | Sleep | 16 | NULL | NULL | | 9 | dim | localhost | Mind | Sleep | 104 | NULL | NULL | | 10 | dim | localhost | Mind | Sleep | 1 | NULL | NULL | | 11 | dim | localhost | Mind | Sleep | 0 | NULL | NULL | | 53 | dim | localhost | UPC | Sleep | 108 | NULL | NULL | | 54 | dim | localhost | UPC | Sleep | 103 | NULL | NULL | | 56 | dim | localhost | UPC | Sleep | 115 | NULL | NULL | | 57 | dim | localhost | UPC | Sleep | 118 | NULL | NULL | | 58 | dim | localhost | UPC | Sleep | 112 | NULL | NULL | | 59 | dim | localhost | UPC | Sleep | 105 | NULL | NULL | ...
and even kill any of them (however, be very careful !!)
mysql> kill 57; mysql> quit Bye # #
MySQL Admin Tips |
MySQL administration is very easy. However, depending on a user's past experience, here are some tips which may help...
First of all, be aware, dim_STAT is using MySQL MyISAM engine to save data. This engine has no transactions support nor transaction log, etc., but it's very easy to manage, it does all needed stuff quite well, providing a reasonable SQL interface, and keep all saved data fully platform-independent! (you may simply copy your data files from Linux/x86 to Solaris/SPARC station and continue to work with them without any problem!). Of course, without transaction log there is still a risk to loose some data due system crash or power outage... But if you'll put to the list of priorities all important points you'll see that loosing few minutes of collected data are much less important rather database software cost as well having skills to administrate it.. - you don't need any DBA skills to administrate MySQL for dim_STAT! UNIX admin habits will be enough :-)
As much as you can, use separated databases: it's much more easier for administration, it avoids possible future activity conflicts, etc. Since v.8.3 there is a possibility to add an Admin password while creating a new database - all administration action then will require giving this password (start/ stop/ restart of collects, data drop, etc.)
Limitation in the number of connections: each MySQL connection uses 5 file descriptors (avg). This means that with a maximum of 1024 file descriptors per process (default in some old systems), we can't create more than ~200 connections on a multi-threaded MySQL server (Note: each STAT command in collect uses its own single connection). In case you run dim_STAT server on Solaris and need more connections (several hosts, many stats, etc), first check the values of your /etc/system parameters : rlim_fd_cur and rlim_fd_max. Next, in the file /apps/mysql/mysql.server replace the default value of 2000 by a new one (current dim_STAT server is just configured with a limit of 2000 connections, however it depends on the system how much it'll be able to acquire, as well you may always increase again this value)...
Accidental "power off" on your machine: MySQL server within dim_STAT is configured in way to force data flush every 5 minutes. So, if your database was not used for a long time - your data should be safe.. However for active databases it's very possible some of their index files will be corrupted. The dim_STAT-Server script will print a warning message in this case, but you'll need to run manually the data checks..
NOTE: you do NOT need to stop dim_STAT server! :-)
Supposing you discovering some data errors on the database "Demo" (for example):
Since v.8.2 auto-repair was removed from dim_STAT-Server script, because:
Since v.8.2:
Probably with a time for some critical system environment there will be a possibility to upgrade databases to InnoDB engine and not take care anymore about system crashes, but it's just a part of future plan for the moment :-)
No more disk space: just add disks if possible :). The collect part of dim_STAT is done in such a way to "keep the flow", in case of errors nothing will be stopped. Once you have added space, the collects will continue, but you probably will get some holes during this period.
To get a backup/copy of your collects in the fastest way: one of the great features of MySQL is its support of cross-platform data compatibility. As an example, the same database files may be moved from a Solaris machine and successfully reused on a Linux laptop. And most cases, copying the whole database to another machine will be much more faster than exporting and again importing collects via flat files. The exception is if you want to move only a very small amount of data from a large database.
Fine, but can we do this on-line? - Yes!! Like in "repair" steps:
Delete the database: there is no way to delete a database via the web interface (generally, I don't like deleting :) ). Delete by error is such a common thing ... so, if you really need to delete your database, the only way is:
Running several MySQL instances on the same host: long time ago it was one of the bigger problems to avoid dim_STAT to conflict with already installed and running databases on an existing system. The solution I found is isolating the dim_STAT database completely from existing instances, but the price for it is a few more complexity for simple things. The tool now uses its own parameters for TCP/IP ports and UNIX sockets. For example, to connect locally to your database server, instead of the usual:
you should now use:# /apps/mysql/bin/mysql DatabaseName
# /apps/mysql/bin/mysql -S/apps/mysql/data/mysql.sock DatabaseName
MySQL: datafile corruption |
This section is covering a particular case when table is not repaired by "myisamchk", and usually you get a following message:"table TABLE doesn't have a correct index definition" etc.
The solution is:
The following example is demonstrating a real case with "dim_MPSTAT" table:
bash# /apps/ADMIN/dim_STAT-Server stop bash# /apps/mysql/bin/myisamchk -r -f dim_MPSTAT.MYI IF IT DID NOT HELP: bash# /apps/mysql/mysql.server start bash# mysql -S /apps/mysql/data/mysql.sock Benchmark_TTT Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Didn't find any fields in table 'dim_MPSTAT' Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 1 to server version: 3.23.53 Type 'help;' or 'h' for help. Type 'c' to clear the buffer. mysql> check table dim_MPSTAT; +--------------------------+-------+----------+-------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +--------------------------+-------+----------+-------------------------------------------------------------+ | Benchmark_TTT.dim_MPSTAT | check | warning | Table is marked as crashed | | Benchmark_TTT.dim_MPSTAT | check | warning | Size of datafile is: 1251942400 Should be: 1251942360 | | Benchmark_TTT.dim_MPSTAT | check | error | Found 16918142 keys of 16918140 | | Benchmark_TTT.dim_MPSTAT | check | error | Corrupt | +--------------------------+-------+----------+-------------------------------------------------------------+ 4 rows in set (19.39 sec) mysql> repair table dim_MPSTAT; +--------------------------+--------+----------+----------+ | Table | Op | Msg_type | Msg_text | +--------------------------+--------+----------+----------+ | Benchmark_TTT.dim_MPSTAT | repair | status | OK | +--------------------------+--------+----------+----------+ 1 row in set (7 min 34.16 sec) mysql> bash# /apps/mysql/mysql.server stop bash# /apps/ADMIN/dim_STAT-Server start
The doc reference is here (see comments) - http://dev.mysql.com/doc/refman/5.0/en/myisamchk-repair-options.html (Thanks Google! :-))
Using InnoDB Engine instead of MyISAM |
Since dim_STAT v.9.0 it is possible to use InnoDB Storage Engine within MySQL instead of MyISAM. This Engine is a true transactional one and pretty safe against server power-off or system crashes.. You may choose to use this InnoDB instead of MyISAM on Database creation, or at any moment convert your Database from one Engine to another. The only thing you'll not be able to do with InnoDB is a full "physical" backup of your Database files (in this case you'll need to convert your Database to MyISAM first). However there is no problem with Import or Export.
NOTE: bigger is your database, more it'll take time to convert it from one Engine to another one..
Since v.9.0 to simplify DBA-like tasks there is an admin tool included: dim_STAT-Admin .
dim_STAT-Admin Tool |
dim_STAT-Admin is shipped since v.9.0 to avoid to use a web interface for sometimes heavy DBA tasks.
With dim_STAT-Admin you're able from the command line:
Command line:
$ ./dim_STAT-Admin dim_STAT-Admin CLI (dim) v.1.0 > Usage: dim_STAT-Admin [options] Options: -CMD Command Commands: CREATE, BACKUP, CONVERT, EXPORT, IMPORT, RECYCLE -Base DBname Database Name (if empty: prints database name list) ... Additional options: (depending on Command) CREATE : -Engine Name MyISAM (default) or InnoDB -Passwd PASSWORD optional password setting for Admin actions BACKUP : -Passwd PASSWORD if password was assigned for Admin actions -File Filename full path output file name for tar.Z backup file CONVERT : -Engine Name MyISAM or InnoDB -Passwd PASSWORD optional password setting for Admin actions EXPORT : -ID id1[,id2,..] Collect ID(s) to export (if empty: prints available Collect list) -Begin YYYYMMDDhhmiss optional begin date+time -End YYYYMMDDhhmiss optional end date+time -File filename full path output file name for tar.Z export file IMPORT : -ID id1[,id2,..] optional Collect ID(s) to import (if known) -File filename full path file name for input tar.Z import file RECYCLE : -Days N keep data collected during last N days -ID CollectID optional collect ids (ex: id1,id2,id3 or "ALL" for any ID) (if empty: All active collects only)
Migration from any old dim_STAT version to the new one |
The migration procedure is quite easy:
Enjoy :))
NOTE: The old database should be seen as before and work correctly, but if you want to get an advantage of the all new features coming within new version, then create a new database and start new collects.
First-Level Security |
The main point: ANY SECURE SYSTEM IS NEVER SECURE ENOUGH... The question is only, what will you consider secure ENOUGH for you :))
Anyway, during discussions with our engineers and customers, the security issue was so often raised that I cannot leave it without attention.
For paranoia-users: there is a Solaris X86 or Linux version of dim_STAT and if you really need maximum protection, spend some money on a small dedicated PC, run dim_STAT on it and protect any access with firewalls, etc.
In my experience, I suggest to protect access to the web server, to prevent somebody from just by error stopping or suspending active collects. For this kind of first-level access protection, a good candidate is Apache's ".htaccess". For a more detailed information, please refer to the Apache documentation. But in short, just to make it work with dim_STAT:
Example:
$ /apps/httpd/bin/htpasswd Usage: htpasswd [-c] passwordfile username The -c flag creates a new file. $ /apps/httpd/bin/htpasswd -c /apps/httpd/etc/.htpasswd login1 Password: ... $ vi /tmp/.htaccess $ cat /tmp/.htaccess AuthName "Welcome to dim_STAT Host" AuthType Basic AuthUserFile /apps/httpd/etc/.htpasswd require valid-user $ $ cp /tmp/.htaccess /apps/httpd/home/cgi-bin $ cp /tmp/.htaccess /apps/httpd/home/docs
STAT-service |
STAT-service was introduced in dim_STAT since version 3.0 and provides a simple, stable and secure way for on-line collecting of STAT data from Solaris/SPARC, Solaris/x86 and Linux/x86 servers. Since v. 8.1 it's distributed under GPL with source code, so you may compile it now yourself on other platforms to collect data from other UNIX platforms. As a pilot example, a package for HP/UX is provided. And any newly ported kits are of course welcome! Since Jun.2009 there is also available a version of STAT-service daemon rewritten in Perl by Marc KODERER: http://search.cpan.org/~mkoderer/stat_agent-0.09/stat_agent.pl - feel free to try this version too and don't forget to send your comments and RFE to Marc! :-)
- 1) dim_STAT connects to the STAT-service deamon of the monitored server
- 2) if the service is not available, then wait a time-out and go to 1) or exit if the STAT collect is stopped during this period
- 3) dim_STAT will ask about the stat command that it needs
- 4) if there are no permissions for this command or the command is not found, the "command" connection will be closed with an error message
- 5) dim_STAT collects the data, maintaining any time-shift due to previous time-outs
- 6) if the TCP connection is broken: go back to 1)
- 7) if STAT is stopped, then close the connection and exit
- 8) if there was no activity during the "auto-eject" timeout, close the connection and goto 1)
- access file all the time checked by STAT-service daemon, so you never need to restart service to activate your modifications.
- since v.8.0 only stat commands working for sure on a given system are enabled by default. It's up to you to enable other commands which may need some additional configuration (like jvmSTAT, oraEXEC, etc.) or simple software presence (like VxVM for vxstat) - "enable" means just uncomment them within your /etc/STATsrv/access file :-)
- since v.8.5 you may add a port number for a command! - it gives a way to collect several similar stats from the same host but from the different sources :-)
For example, if you're running say 3 Oracle database instances on the same server and still wanting to monitor each one in details, but there is only one oraEXEC possible per system because (as it) it may accept only one Oracle SID... So you may just make several copies of the same oraEXEC.sh wrapper and assign them to the different ports like that:command oraEXEC /etc/STATsrv/bin/oraEXEC_sid0.sh command oraEXEC:5001 /etc/STATsrv/bin/oraEXEC_sid1.sh command oraEXEC:5002 /etc/STATsrv/bin/oraEXEC_sid2.sh
Install STAT-service |
The STAT-service module is shipped as part of the dim_STAT distribution (dim_STAT-INSTALL/STAT-service directory), in form of Solaris packages or as tar archives for manual integration. STAT-service has to be installed on every machine that needs to be monitored. The install is to be done as "root" user.
Package install (".pkg" file) :
# pkgadd -d STATsrv.pkg
Manual install (".tar" file) :
# cd /etc # tar xvf /path_to/STATsrv.tar # ln -s /etc/STATsrv/STAT-service /etc/rc2.d/S99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rc1.d/K99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rc0.d/K99STATsrv # ln -s /etc/STATsrv/STAT-service /etc/rcS.d/K99STATsrv
The software needs to be installed into a special /etc/STATsrv directory, which is the home directory of STAT-service. The contents of this directory is:
/etc/STATsrv/ STAT-service -- script to start/stop service daemon, also defines port number to listen (def:5000) access -- access control file /bin -- contains extended STAT programs/scripts /log -- contains all logged information about service demands
Next step, start the service daemon:
# /etc/STATsrv/STAT-service start
The way dim_STAT and STAT-service are communicating with each other is very simple:
As you see, this schema is quite robust and will work after cluster switching, network corruptions, reboots, etc. Collections can be started once and then left running for a long period. In case you need to collect only during specific time intervals, you may just start and stop the STAT-service through a "cron" job or a similar tool.
Note: it appears that during a halt of the system (a power-off of a running machine), the TCP/IP connections can stay and don't receive an error code. When this happens, the collect should be broken via a "auto-eject" timeout. However, auto-eject can also happen due to a mini-hang on the system or simply of the stat program. In this case you'll see holes in your collects, so take care when interpreting the results.
STAT-service Access control file |
Here is an example of STAT-service access control file. As you see, you may limit the number of stat commands accessible for each machine. This task may be done by host administrator and may be completely independent.
IMPORTANT :
# # STAT-service access file # # Format: # ... # command name[:port] fullpath # ... # access IP-address # ... # command name[:port] fullpath # ... # # By default all machines in the network may access to STAT-services # # Keyword "access" make access restriction by IP-adress for all following # commands till next "access" section. # # For example: # # ==================================================================== # # # # Any host may access to vmstat and mpstat collections # # # command vmstat /usr/bin/vmstat # command mpstat /usr/bin/mpstat # # # # Only machines 129.157.1.[1-3] may access netLOAD collections # # # access 129.157.1.1 # access 129.157.1.2 # access 129.157.1.3 # command netLOAD.sh /etc/STATsrv/bin/netLOAD.sh # # # # Only machine 129.157.1.1 may access psSTAT collections # # # access 129.157.1.1 # command psSTAT /etc/STATsrv/bin/psSTAT # # # ==================================================================== # # """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" # // All folowing commands should work out the box... // # """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" command Lvmstat /etc/STATsrv/bin/vmstat command Lvmstat:5001 /etc/STATsrv/bin/vmstat2 command Lmpstat /etc/STATsrv/bin/Lmpstat.sh command tailX /etc/STATsrv/bin/tailX command LioSTAT /etc/STATsrv/bin/ioSTAT.sh command LpsSTAT /etc/STATsrv/bin/psSTAT.sh command LPrcLOAD /etc/STATsrv/bin/ProcLOAD.sh command LUsrLOAD /etc/STATsrv/bin/UserLOAD.sh command LnetLOAD /etc/STATsrv/bin/netLOAD.sh command LcpuSTAT /etc/STATsrv/bin/cpuSTAT.sh command sysinfo /etc/STATsrv/bin/sysinfo.sh command SysINFO /etc/STATsrv/bin/sysinfo.sh command IObench /etc/STATsrv/bin/IObench_STAT.sh command dbSTRESS /etc/STATsrv/bin/dbSTRESS_STAT.sh command dbSTRESS1:5000 /etc/STATsrv/bin/dbSTRESS_STAT.sh command dbSTRESS2:5001 /etc/STATsrv/bin/dbSTRESS_STAT.sh # """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" # // Next commands may need some additional configuration // # // (see each *.sh to get more details before uncomment) // # """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" # Java (JVM) #command jvmSTAT /etc/STATsrv/bin/jvmSTAT.sh # Oracle #command oraEXEC /etc/STATsrv/bin/oraEXEC.sh #command oraIO /etc/STATsrv/bin/oraIO.sh #command oraENQ /etc/STATsrv/bin/oraENQ.sh #command oraLATCH /etc/STATsrv/bin/oraLATCH.sh #command oraSLEEP /etc/STATsrv/bin/oraSLEEP.sh # MySQL #command innodbSTAT /etc/STATsrv/bin/innodbSTAT.sh #command mysqlSTAT /etc/STATsrv/bin/mysqlSTAT.sh #command mysqlLOAD /etc/STATsrv/bin/mysqlLOAD.sh # PostgreSQL #command pgsqlSTAT /etc/STATsrv/bin/pgsqlSTAT.sh #command pgsqlLOAD /etc/STATsrv/bin/pgsqlLOAD.sh #
Main Page |
Now, the installation is finished, the database and the web servers are running. Be sure that the STAT-service is installed and running on allservers you want to monitor. You'll be surprised, but when people are having trouble, in 90% of cases it is just forgetting to start the STAT-service.Once it's done, you are ready to open a web browser (doesn't matter if it is Java enabled or not) and connect to the dim_STAT web server. The first page contains some links to documentation, presentation, tool history, etc., but the link you'll need to click is "Main Page".
As you already supposed, the Main Page will group all main actions ... and you're right!
I will not present this action by action, but rather functionality by functionality, in order of operation. However, the shortest working cycle is probably still:
A few words about the User Interface. Don't be surprised if you will not find any "Back" button once you leave the Main Page. There isn't one! You have to use your browser's navigation back button for it. And it's not because I'm just lazy :)) The reason is simple: dim_STAT uses Java applets to present data in graphical mode, but it seems for every Java applet instance the web browser instantiates a dedicated JVM. And all JVMs will stay in the browser's memory until it will crash with an "out of memory" error. To prevent that, I unfortunately have to force you to use your browser's
- Starting STAT collect
- Analyze/Monitor collecting data
- Stop STAT collect
button.
Since version 7.0 you'll see a small toolbar at the top of your page representing:
- Currently used Database Name
- Short links into Home/ Preferences/ Log Admin
- vi /opt/WebX/x.config
- duplicate the line with "harms:88"
- in the new line, replace "harms:88" with "harms.France.Sun.COM:88"
- save the file
- reload the Main Page in your browser
- FireFox - most stable web browser for today, works perfectly with Java applets and may be the best choice. Specially useful as it's able to keep all checkboxes remaining pre-selected even if you're reloading an active page ;-)
- Opera - seems to work fine since v.5 (and I'm using it a lot as an excellent alternative ;-))
- Konqueror - generally working out of the box, probably the best choice for KDE-lovers :-))
- Safari - works just fine out of the box, probably the best choice for Mac-lovers ;-))
- Mozilla - you should upgrade to at least to version 1.7. In previous versions there was a bug starting an applet before receiving all given parameters. Also, 1.7 and later is much faster compared to previous versions.
- IE - never used it myself, but it seems to work for customers, etc.
ERROR: No X_ROOT configuration for SERVER |
Sometimes, instead of the Main Page, you see this error message.
Don't worry, nothing wrong!! What is happening is that your DNS translation simply did not match the configuration settings. Go to theWebX home directory (ex: /opt/WebX) and open the "x.config" file in a text editor. Find the line containing your host name in the first column. Duplicate this line and replace in it your hostname:port pair as given by the string in the error message after "SERVER:". Save the file and try to connect again. It should work immediately!
Example: Error Message: "No X_ROOT configuration for SERVER: harms.France.Sun.COM:88"
Note: X_ROOT is a one of WebX's configuration parameters. As WebX is an interpreter, there should be a way to protect it from "interpreting" something else than application pages (ex: /etc/passwd). X_ROOT gives WebX its main "root" directory, so that only pages in this specific directory tree can be executed, and nothing else.
Note: Since v.9.0 the pattern "*:*" is provided to accept any host name with any port number in case such a level of security is not required..
Web Browsers |
Since version 7.0 you may use any web browser as long as it supports the PNG image format (true for nearly all available browsers).
However, if you prefer the interactive graphs from dim_STAT's Java Applet, you must have a Java plug-in configured. Here are a few notes about specific browser programs:
There are some other browsers out there, but as a general rule, if you see instead of the graphics an error message "Browser BUG", then you should either upgrade your browser or move on to another one. As well if you use only PNG graphs you will usually never meet any problem...
Preferences |
The preferences page contains a set of key options used by different parts of the application. The most critical of them are grouped here. All other options (if supported) are "auto-keeping" their last value. If you used dim_STAT before you will notice that there are no graph settings anymore, all graph values are auto-saved each time when you use the graph view.
Note: your browser must accept cookies to make some of the following features working!!
There isn't a global "settings" button, and I didn't want to create too many links. So, each option has its own validation button, don't forget to click it to apply your modifications.
Database - Without any special settings, all collected data is stored in the "Default" database (the real MySQL name is "dim"). However, to avoid possible contention and simplify further administration, it's highly recommended to use different databases for different projects/ users/ centers/ etc. Within the Database section you can choose the name of the existing database you want to use or you can create a new one and use it instead. Since v.8.3 there is a possibility to add an Admin password while creating a new database - all administration action then will require giving this password (start/ stop/ restart of collects, data drop, etc.). As a reminder, the current database name is shown in the browser's title and the toolbar of every dim_STAT window.
Free and Used disk space - Showed for the current database. ( Note: MySQL has a quite small storage footprint, so disk space usage will be most reasonable, but it's a good habit to check from time to time if you still have a disk space! (since v.8.2 datafiles are configured to be able to reach 2TB in size (seems enough, no? ;-))...
Host Name List - Here you can specify a pre-defined list of the servers you usually monitor. This list is saved within a database, so every person using the same database may reuse it; as well if you switch databases time to time in your browser your host list will be changed automatically! Since v.8.2 the host "aliasing" is added: the complete syntax for the host name is [alias/]hostname[:port]
Example:
- you want to collect data from a host known in the LAN as abz45060 , IP address 10.1.1.15 , and running STAT-service on the port5050 (because 5000 was already used by another application)...
- if you like the name "abz45060" - you may just enter abz45060:5050 into the host list
- but if you prefer another name (ex. reflecting a server role, etc.) - for example "oradb" - you may just enter oradb/abz45060:5050 and in every graph this host will be named as oradb
- NOTE: you may also replace abz45060 by IP address: oradb/10.1.1.15:5050 (according to your taste :-))
Bookmark Term - If you have never used dim_STAT before, just leave it as it is. For others, this option was created to satisfy everyone who prefers a different name for "Bookmark" functionality. Bookmarks were introduced in version 4.0, but after long discussions we still have no agreement on the right name. So, now you're free to name it as you like! :))
LOG Messages option - Gives you a way to set:
- enable/disable auto-generated time slice messages for easier time interval selection
- message list size setting (in lines)
- max message visible length (in characters)
Page Colors - You're free to play with page colors if you're not happy with the default settings or simply prefer to change it from time to time.
Check Java support - A simple way to check if the dim_STAT applet is working correctly with your browser.
Start On-Line Collecting |
Before starting any STAT collect, first check if the STAT-service is running on every server you want to monitor. This is the most common error!!Another point, if you want to monitor a Linux server, be sure you've installed the Linux STAT Add-Ons, before starting any collect (see the special Linux section in this document).
Now, from the dim_STAT Main Page you may just follow the Start New Collect link. (Note: since version 8.0 there is no distinction anymore between single and multi host collect).
IMPORTANT:
- A STAT collect for a host is independent of any other, so it can be stopped and/or restarted at any time, independent of other collects.
- Your collect options saved into special script files with a name based on the "Collect Base Name". Using customized names you may pre-load a different set of options, according to your needs.
- You may start a collect on-line from your browser, or you can make a start script, to be run by hand, via cron, as a batch job, etc.
- choose a host name(s)
- set collect attributes (title, id, etc.)
- choose collected statistics
- start now, or prepare a script for manual/delayed execution
- Red: the host is not running STAT-service on the default port, or the host is inaccessible from the network, or the host is down.
- Orange: the host is running STAT-service but an older version.
- Green: ok! STAT-service is running and has all required features.
- VMSTAT
- MPSTAT
- IOSTAT
- netLOAD (avoid using 'netstat')
- ProcLOAD
- neel, fourrier - Solaris hosts running upgraded STAT-service
- localhost - Linux box, upgraded STAT-service
- sting - Solaris host, old STAT-service
- fudji - Solaris host, powered off
- Linux stats are not proposed for 'green' Solaris hosts
- Solaris stats are absent for 'green' Linux hosts
- for any 'green' host, not configured or disabled stats are absent
- the 'orange' host — sting — has all its stats present, but as it was from before v.8.0, it's up to you to remember which commands will run on it or not
- install dim_STAT on a host
- start STAT-service on the same host
- collect data from that host into dim-STAT on the same host
- and be aware, on a 4 CPU machine (which is relatively small server) a collect with a 20 second interval (vmstat + mpstat + iostat + psSTAT + netLOAD), will generate only 0.2% CPU usage. (Yes !!)
Main Steps |
There are 4 main points in starting a STAT Collect:
1. Host name(s)
Since version 8.0 you choose your host(s) first. You may setup a list of frequently used host names on the 'Preferences' page. This list as well all other used host names are kept via browser cookies. Before you start any STAT collect, for each given host name, dim_STAT will indicate the status of your host's STAT-service by LED color. I hope it avoids potential misconfiguration issues for both new and experienced users. For now there are 3 LED colors:2. Set Collect Attributes
NOTE: since v.8.0, STAT-service has a new 'stat publish' feature. Using this, the application knows exactly what kind of STATs you can or can't collect from any given host. It protects you from choosing the wrong or unavailable data.
Collect BaseName - all selected options are saved in a special start script. The name of this script is composed of BaseName + some context extentions. When you start a new collect, the next time you may pre-load previously selected options by giving the previous BaseName and clicking on "Preload" (by default the last given BaseName is stored using a cookie).3. Choose Statistics
ID - all data in the database referenced to this ID. The ID is not assigned automatically, to give you a choice to use personalized range numbers (your project id; etc.).
Title - the title description you give for starting the collect.
Time Interval - how frequent (in secs.) you want dim_STAT to collect data (the default is 30 seconds, which is right in most cases)
Client Log File - the name of a file on the "host" that you want to watch. All text lines appended to this file will be automatically be copied into the STAT database and timestamped. While analyzing the collected STATs you can visualize the log messages that correspond with the analyzed interval. This may be very useful to trace auto-starting jobs, night batches, etc. They also give you a simple and fast way to find the correct time position during data analysis, like "show N minutes before/after/around a selected message".
STAT-service Port - the port number on which STAT-service is running. By default the tool will use the port number given during installation and it's a good practice to use the same port on every host.
Simply select all statistics you want to collect. Help bullets show a full description of each STAT (if you have JavaScript enabled in your web browser). Better be selective, probably you don't need everything.4. Start Mode
A good set of STATs to start with:
These STATs will give you a good overview of the resource utilization on your hosts. Once you have analyzed them, you may go more in-depth and fine-tune the selected STATs.NOTE: all "official" Add-Ons are installed by default in each dim_STAT database, BUT! not enabled by default in the STAT-service! On the host side only surely working stats are enabled! Be sure to check /etc/STATsrv/access file on each server before you're starting any collect! :-)
For example: if you're needing to collect "vxstat" data, and you know a VxVM is installed and running on this host - just uncomment the VXSTAT line in your /etc/STATsrv/access file and things will work!...
[ Make Start script only ] - don't start the collect, just create a script
[ Start Now! ] - start the new STAT collect right now
[*] Show Debug output - in case you want to see debug messages about the collect startup
Few screenshots... |
Select Host(s) |
You may see here several servers:
I select neel, fourrier, localhost and sting and click on [Continue] button...
Choose STATs |
The hosts are chosen, let's select the STATs to collect.
Some remarks about these hosts:
Choose STATs, next |
Load collect from output files |
If you cannot collect data directly from your hosts and all you have is a set of statistics output files, then you may still download them via the Web interface as a STAT collect and analyze later. Just fill the required parameters and of you go.
However, if your output files are representing a big volume, it may take much more time to load, and your browser may simply timeout and loose the connection. And you'll never see the final result.
In such cases, a better solution is to use EasySTAT (simplified) or BatchLOAD (for more experimented users). See the following sections for more details.
Standalone configuration |
Before you think about collecting your stats via some kind of scripts, don't forget about the possibility of a "standalone" dim_STAT. There is absolutely no restriction to:
The CPU usage of dim_STAT for collecting data is very low. However, during data analysis or when doing export/import/etc. actions, CPU utilization is very high.
So, don't forget about this simple solution: install dim_STAT on the same host you want to collect from, collect locally all the data you need, and then backup the whole database, copy it onto another machine and analyze it there. Alternatively, in the case of a benchmark, keep the data on the same server, but take care that you're not doing any analysis at the same time as you're running your testruns.
EasySTAT |
Since dim_STAT version 7.0, the EasySTAT script makes part of the STAT-service for Solaris. EasySTAT is designed to simplify the combination of collecting STATs on "very remote" or "highly secured" hosts with BatchLOAD.
In a few words all you need to do is:
- install STAT-service on the host
- run EasySTAT
- backup the output directory
- restore the directory on your dim_STAT server
- execute the "LoadDATA.sh" script (from the directory)
EasySTAT Usage (v.1.9)
$ /etc/STATsrv/bin/EasySTAT.sh OutDIR IntervalSec NbHours [Title [Hostname [DBname [Batch [Log]]]]] options: OutDIR - Output directory for stat collects (def: /var/tmp) Interval - measurement interval for stat commands in sec. (default: 15) NbHours - execution duration in hours (default: 8 hours) Title - title to use during BatchLOAD processing Hostname - hostname to use during BatchLOAD processing DBname - database name to use during BatchLOAD processing Batch - full path to BatchLOAD binary on your server (default: /apps/ADMIN/BatchLOAD) Log - log file name (if given, all processing output is forwarded into this file) NOTE: may also be enabled via LOG environment variable (see EasySTAT.sh for details)
EasySTAT Config
By default script collects 5 main stats:
- VMSTAT (runqueue, memory, CPU)
- MPSTAT (per CPU usage, interrupts, mutex, etc.)
- IOSTAT-xn (per disk I/O stats)
- netLOAD (network per interface stats +nocanput)
- ProcLOAD (processes stats summarized by process name)
- you may add any other Add-On commands by editing /etc/STATsrv/bin/EasySTAT.sh
Additional Options
- To reduce disk space usage, since v.8.3 if environment variable COMPRESS is set, EasySTAT will automatically call it to compress every finished output file:
Don't forget to "uncompress" output files before start any load process! :-)# COMPRESS=gzip /etc/STATsrv/bin/EasySTAT.sh ...- Since v.8.3 if TIMER environment variable is set to "yes", EasySTAT will automatically timestamp all collecting data within its output files:
All timestamp tags are transparent for BatchLOAD and serving only to simplify human reading. Also, if during collecting there were some output freezes due high system load or other - Timer will automatically take care about it and add a special time sync tag to synchronize data when loading to the database..# TIMER=yes /etc/STATsrv/bin/EasySTAT.sh ...
NOTE : since v.8.3-1 both COMPRESS and TIMER options are included within EasySTAT.sh script by default !!! - it's preferable to have compressing and timestamps out of the box to avoid any space overflow as well a faster text file analyzing. However be aware you have to edit EasySTAT.sh file to disable them (but at least you know what you're doing :-))
- Per hour files -- to avoid having collected data out of sync with a real time, EasySTAT is restarting each stat program every hour; so every hour you have a new file for all stats, and it's by defaulf, and was designed from the beginning
- Run forever -- to run EasySTAT for undetermined period just give a "0" for a number of hours - also, in this case EasySTAT will not create a new working directory for incoming stats, but (re)use the given directory name
- Inittab -- you may use /etc/inittab to make your EasySTAT collects permanent - if for any reasons collects were stopped (or killed) - init process will restart them automatically! - all you need is just to add a such kind of line at into your /etc/inittab:
dim:3:respawn:/etc/STATsrv/bin/EasySTAT.sh /var/tmp/stats 15 0 2>&1 >>/var/tmp/stats.log
# init q
the advantage of a such solution is that it'll work on any UNIX platform ;-) - PID file -- EasySTAT always creating a pid file within its working directory: .EasySTAT.pid
- Stopping -- at any time you need to stop EasySTAT gracefully - just send a TERM or INT signal to its PID:
# kill `cat .EasySTAT.pid`
- LoadDATA.sh file(s) -- on USR1 signal EasySTAT backing up its current LoadDATA.sh file into a LoadDATA.sh-saved-... and then creating a new LoadDATA.sh for all next incoming collects (until next SIGUSR1 ;-)) - it may be helpful if you're collecting your stats permanently but want to be able to upload them into your dim_STAT database by time periods, etc...
Example |
Collect STATs :
On the 'Very Remote' Host:
==> copy STATsrv.pkg somewhere (ex: /tmp) and install: # pkgadd -d /tmp/STATsrv.pkg ==> create data dir # mkdir /var/tmp/Easy # cd /var/tmp/Easy ==> collect data every 30sec. for 24 hours # nohup /etc/STATsrv/bin/EasySTAT.sh /var/tmp/Easy 30 24 & ... ==> archive+compress collected data # cd /var/tmp # tar cf - Easy | compress > /tmp/Easy.tar.Z ==> copy /tmp/Easy.tar.Z into your laptop/flash/CD/etc. # ...up to you :) ==> remove all staff if no more need # rm /tmp/Easy.tar.Z; rm -rf /var/tmp/Easy; pkgrm STATsrv
Load Collect then Analyze :
On your local dim_STAT server:
==> restore Easy.tar.Z somewhere (ex: /home/tmp): # cd /home/tmp # uncompress < Easy.tarZ | tar xvf - # cd Easy/* # gunzip *.gz ## (if compressions was used) ==> edit if you need to modify default settings (db name for ex.) # vi LoadDATA.sh ==> load all data into your database (don't forget to create this database before!!!) # sh LoadDATA.sh ==> Analyze data via web interface & enjoy :))
EasySTAT Hints |
Few notes about EasySTAT hints (some were introduced with 8.3-1 version):
BatchLOAD |
The idea for BatchLOAD came (as many things) from day to day needs. Sometimes you are facing customers/users who want to know what happens on their machines, but then they don't allow the installation of any additional software (a very constructive approach :-)).
All you can do now is to ask them to run some stat commands on their systems and send you the output files. While loading their files every day via the Web interface, you start to think harder and harder if there isn't a way to do this automatically. Are you ready for BatchLOAD??
I decided to add a new component to dim_STAT, but I kept in mind that other tools already exist that are collecting output from stat commands. All these tools are keeping data in their own format, so I've tried to design the input format for BatchLOAD to be easily adaptable. Of course, I didn't think to create something universal :)), but I hope it shouldn't be too hard to write a script that can convert from an existing format into BatchLOAD.
Some words about the internals of BatchLOAD. There is no dependency on the name of loaded files. All needed information is given by command options and in the contents of the loaded file. The loaded file must have special TAGs. At least two: to give the STAT name and to confirm the END.
USAGE:
Usage: /apps/ADMIN/BatchLOAD -cmd NEW/ADD options Options [NEW]: -- force new collect creation -base DBname -- database name -ID id -- Collect ID, if 0 use max+1 id automatically -title Title -- Collect Title -host Hostname -- Collect Host Name -isec sec -- Collect STATs Interval (sec) -start datetime -- Collect Start DateTime in format YYYYMMDDHHMISS -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- Full path to file with STATs outputs -verbose on/off -- verbose output on/off Options [ADD]: -- add to existing collect whenever possible -base DBname -- database name -host Hostname -- Collect Host Name (optional) -ID id -- Collect ID, if 0 : -- if host is given - use max id used by host -- otherwise, use max (last) id automatically -skip1 yes/no -- Yes/No skip first STAT measurement (often wrong values) -file Filename -- File with STATs outputs -verbose on/off -- verbose output on/off
Example :
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -file `pwd`/vmstat.out -skip1 no -title "Test BatchLOAD" -host V880 -isec 20 -start 20031024100000 $ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/iostat.out -skip1 no $ /apps/ADMIN/BatchLOAD -cmd ADD -ID 0 -base ANT -file `pwd`/mpstat.out -skip1 no -verbose on
In this example the first line will create a new STAT Collect using an automatic new ID (max+1), with the title "Test BatchLOAD" and it will load the first file: "vmstat.out" The second and third lines load into the new Collect the next data, "iostat.out" and "mpstat.out". Once it is finished, we can connect to the dim_STAT web server and start to analyze.
Note : multiple "-file" options can be used at the same time. For example:
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -skip1 no -title "Test BatchLOAD" -host V880 -isec 20 -start 20031024100000 -file `pwd`/vmstat.out -file `pwd`/mpstat.out -file `pwd`/iostat.out
File Format of STAT output
The file format is designed in such a way as to give maximum flexibility on data grouping and processing.
The main TAGs are STAT and END:
==> STAT StatName -- after this point all following data corresponds to given STAT command (StatName) Supported STAT names: VMSTAT MPSTAT IOSTAT (iostat -x) IOSTAT-xn (iostat -xn) VXSTAT (vxstat -v) psSTAT And all other Add-On STAT you are able to create, like some already shipped: netLOAD T3stat oraEXEC oraIO ... ==> END -- end of STAT data
At any time the following TAGs may also be inserted:
==> DTSET yyyy-mm-dd hh:mi:ss -- set date+time point for next STAT data ==> LOGMSG message -- add log message into database corresponding to the currently loading data
Outside of the "STAT" - "END" blocks, any other lines are ignored.
Note : TAGs are exactly as it shown: "==> STAT", "==> END", "==> DTSET", "==> LOGMSG". Don't miss any characters!
BatchLOAD Example |
A small example, let's say you have three vmstat and three iostat files corresponding to let's say "morning", "day" and "night" activity for some special tasks. Therefore you can make six load files, each one containing its own "STAT", "DTSET", "END" TAGs, or put all in one.
... ==> DTSET 2004-01-19 10:30:00 -- set "morning" point ==> LOGMSG Morning workload ==> STAT VMSTAT -- load vmstat ... output of vmstat.out1 ==> LOGMSG Strange CPU activity -- marking time period to analyze (example) ... continue ... ==> END -- end of first vmstat ==> STAT IOSTAT-xn ... output of iostat.out1 ==> END ==> DTSET 2004-01-19 14:30:00 -- set "day" point ==> LOGMSG Day workload ==> STAT VMSTAT ... output of vmstat.out2 ==> END ==> STAT IOSTAT-xn ... output of iostat.out2 ==> END ==> DTSET 2004-01-19 23:30:00 -- set "night" point ==> LOGMSG Night workload ==> STAT VMSTAT ... output of vmstat.out3 ==> END ==> STAT IOSTAT-xn ... output of iostat.out3 ==> END
All information is placed in one single file ready to load:
$ /apps/ADMIN/BatchLOAD -cmd NEW -ID 0 -base ANT -skip1 no -title "Customer Workload" -host V880 -isec 20 -start 20040119100000 -file `pwd`/all_stat.out
In the same way, you can group all data of the same STAT command in a single file. Or all outputs corresponding to the same collecting time period.
NOTE : don't forget to create your database before starting any load!! In this example the database name is 'ANT'.
Special NOTE |
Please, take care - there is no option to give a name of loaded stat command! That's why "STAT" and "END" tags are mandatory!. Even you want to load just one vmstat file, tool have no idea about your file contents till it'll meet a "STAT" tag inside!
GUDs integration |
If you already worked with Sun support or you're Sun employe - you may know or already used GUDs (shell script collecting various system information + stats and saving them into special case archive). GUDs was created by Sun France engineer, and another French engineer made an integration script to load GUDs data into dim_STAT via BatchLOAD - 'guds2dim.sh'. This script is shipped now with dim_STAT and may be found in /apps/ADMIN directory. To obtain GUDs script - please, contact directly Sun support.
Analyzing |
Analyzing your STAT data is quite intuitive, but let's just give some screen shots and few words of comment.
Once you click on the "Analyze" link you have 3 options:
- Single-Host Analyze
- Multi-Host Analyze
- Multi-Host Extended Analyze
Let's take for now the Multi-Host option, as it's the easier one :-)
There are some other additional options:
- Active ONLY - show only currently running collects
- STATs Status - in Single Host mode this option shows high numbers of already collected stats (very important to see if something is really collecting)
- Title matching - to filter collects on title pattern
- LOG matching - to filter LOG messages with a text pattern
Welcome Analyze! |
LOG Messages |
A few words about LOG Messages. As we saw already during the start of a new collect, you can use an optional parameter, Client Log File, to catch during the collect time any new text messages in this logfile. All messages are saved with a time-stamp in the same database as where the collect data is stored. Alternatively, at any moment you may add these kind of messages manually using the web interface. There is a special link "LOG Messages Admin" and under every graph view there is a a link to add a new message.
But, when can this be helpful?
Firstly, it'll help you to choose the correct time intervals for analyzing data, without having to remember the exact time slices when something particular happened on this machine.
Secondly, when analyzing the activity on your machine, you'll be able to get a list of every registered event, corresponding to the same time interval.
Example 1
Let's say you DBA in vacations and you're acting for a few days. The user claims that time-to-time something happens on the machine and slows down his work. You're starting to monitor the system, and yes sometimes you observe strange activity on the Oracle side. So, instead to write down the times corresponding to the problem, you simply add two messages: "Something strange" and "Ok now" while you're analyzing activity graphs. Once your DBA comes back, you may just point him to your messages. Also, if somebody else will analyze the time slices, entering the same perimeter, he or she will also be warned by your messages!
Example 2
Every night you're starting some batch jobs while nobody else is working on the system. There are several important parts and you're trying to optimize them or simply check nothing goes wrong.
Let's assume your main batch script is looking like:
#bin/sh start_batch01 start_batch02 start_batch03 start_batch04 ... start_batch20 exit
Now, simply add log messages:
#bin/sh echo "Start Night Batch" >> /var/tmp/log echo "start batch01" >> /var/tmp/log start_batch01 echo "start batch02" >> /var/tmp/log start_batch02 echo "start batch03" >> /var/tmp/log start_batch03 echo "start batch04" >> /var/tmp/log start_batch04 ... echo "start batch20" >> /var/tmp/log start_batch20 echo "End Night Batch" >> /var/tmp/log exit
After that, every time you start a new STAT collect to monitor this machine, you give "/var/tmp/log" as Client Log File name. This way, every time you start your main batch script, every message written into /var/tmp/log will be saved and timestamped in the dim_STAT database. To select the correct time interval for analyzing the workload during for example batch04, you only need to simply click between the messages: "start batch04" and "start batch05".
Tasks |
There are two special "Task" tags that may be used with log messages:
===> TASK_BEGIN: Unique_Task_Name --Marking begin of task execution
===> TASK_END: Unique_Task_Name --Marking the end
The Unique_Task_Name should be one word of up to 40 characters and unique within the current collect. For example, for 4 batches started in parallel we can add to the script:
( echo "===> TASK_BEGIN: batch1" >> /tmp/log; batch1.sh; echo "===> TASK_END: batch1" >> /tmp/log ) & ( echo "===> TASK_BEGIN: batch2" >> /tmp/log; batch2.sh; echo "===> TASK_END: batch2" >> /tmp/log ) & ( echo "===> TASK_BEGIN: batch3" >> /tmp/log; batch3.sh; echo "===> TASK_END: batch3" >> /tmp/log ) & ( echo "===> TASK_BEGIN: batch4" >> /tmp/log; batch4.sh; echo "===> TASK_END: batch4" >> /tmp/log ) &
When you analyze activity graphs later, you can use the "Show Tasks" button to get a short summary about all the executed tasks during the observed period and with their total execution time (if they are finished). This can be useful in case you're starting big long jobs in parallel. And they are all executed by the same process, so there is no way to know which one is running which job.
Multi-Host Analyzing |
Multi-Host analyzing is simpler than Single-Host analyzin and a good point to start.
NOTE: some screenshots may not be 100% up to date and don't matching exactly the latest dim_STAT version.
Main point: as we want to see several hosts at the same time and on the same graph, we cannot show more than one single stat-value per graph, however there can be several graphs viewed on the same page.
In general:
- Choose STAT collects
- Choose the time interval you are interesting in
- Choose Graph size/mode attributes
- Choose STAT data you want to analyze
- Go!! :-)
- Java Applet/ PNG Image - graph output format
- Histogram - one comment: histograms are only supported with Java output.
- Real Graph - in case there was no data during any time period for some stat components, the graph line will be stopped for this period and will continue once this component came back. Example: while collecting, one user was disconnected for a while and re-connected again. So, the graph will represent both "real" activity and "inactive" periods will be represented by holes. The only problem occurs when the observed component switched too often from/to "live"-"dead"-"live" states. In this case, instead of a graph, you may see a set of dots, which isn't much less fun.
- Continuous Graph - as opposed to Real Graph, ContGraph will replace the "holes" by zero. So there will never be "dots" on your graphics and each graph line will stay perfectly continuous. However, there is no more a visual difference between an "inactive" and a "dead" component.
- Force Graph alignment - this is useful only for Java graphs and done automatically for static PNG images.
- Force Data Gap completion - this may help you to see continuity in time scale graphs, when you have short periods of data missing (host reboot, etc.). If you don't use this option, a data hole is made visible by a red vertical bar in the graph. Be careful with this option, because if your time gap is large (days, weeks, etc.), you may wait for a few hours to get your graph. In the meantime the tool will try to refill all missing data with zero, and you will just see a big hole in the middle of graph.
- Auto-Sync: with version 8.1 a new auto-time-sync feature was implemented to avoid the problem of time shift with some Solaris and Linux commands. This is done by automatically re-syncing every hour the collected data with the current time. But the red bar may still be present on your graphs even when there was no stop time on the service, etc.
Select Multi-host |
Choose Collect(s) and Time interval |
Collects - Let's assume there are three hosts I want to see together. OK, these collects are only used as examples and not to give demo data.Time Interval - I described before the advantages of using LOG messages. Here is one of the better examples. I've simply selected the begin and the end of the time slice I'm interested in for my production workload.
NOTE: you may select several intervals and compare them all together on the same graph. For example, to compare today's and last week's activity during a similar workload.
Choose STATs |
Graphics - This is a quite intuitive section, isn't it?You simply choose the style of your graphical presentation:
Finally, to accommodate your preference, there is an option to choose between Normal and Bold lines for drawing your graphs.
Note: all Graphics parameters are saved and kept with cookies. They will be used again the next time you use this function.
Next, you just choose the STAT values you want to see on your graph (example: CPU and Net packets/sec)...
Go! |
Once you set "content" and "presentation", you can also set some other parameters:
Show LOG: In case you want to see LOG messages at the same time as graphs, so that you can analyze better the events that happened. There are also two modes to view logs: Static and Dynamic. In Static Mode all messages are presented inside of a simple HTML table. In Dynamic Mode they are all inserted into a small scrollable window and if you click on any message in that window you will set/unset a red bar crossing all graphs that correspond to the message timestamp...
Show Tasks: print a table of all running/finished tasks corresponding to the current time period
Refresh: this will refresh the result page every number of seconds. A function, very useful for on-line monitoring. You can do the same through browser options in Opera or Firefox)
Let's START!!
Result with Static Log |
(Sorry, there was no more place on screen for the LOG :)))
Result with Dynamic Log |
If you use dynamic logs and applet output, single clicking on a message line will set on / off a vertical red bar on the graph. This bar shows you exactly the place that corresponds with the message timestamps.
As you see, at any moment you may add another Log message.
Single-Host Analyzing |
Single-Host Analyzing is very similar to Multi-Host, but gives a wider variety of parameters as it is working only with one particular STAT collect. Let's use as an example the Demo collect, which is provided with the dim_STAT database and let's analyze IOSTAT data.
Open your browser and follow step by step how we're connecting to the dim_STAT server.
- nothing selected means using all data without refining your select
- you may refine your criteria by selecting only certain disk(s)
- you may exclude your selected disk(s) by clicking the 'Inversed Selection' checkbox
- you may use value-oriented selection (ex. Top-10 Busy% disks)
- you may exclude disks with unwanted data values
- or finally, give a select pattern (very useful if you want to avoid SDS metadevices, etc.)
- Graphics - graphical representation (as we saw already before)
- Table of Results - the raw data is presented as HTML or Text output (table format) and printed on screen or into a temporary file
- Top-N values - in a few clicks check the MAX/MIN values of any STAT variables during the given time period. For example: if there were no disks busier than 30%, you even don't need to look at graphs, or if there are any, you know at once the time slices you need to analyze for a possible jump in activity.
- Graph
- with disk Busy%
- and Bookmark Links
- you can follow a fixed list of disk controllers on all servers rather to see a sum of all disks..
- you can follow CPU usage by selected users/processes on all servers rather a whole CPU usage..
- and many others ;-)
Choose Collect and STAT |
Example IOSTAT: Choose Disks criteria |
Your choice of options is much broader in Single-Host mode. You can analyze your collected data in fine detail, adapting them to your needs...
Disks - several possible combinations, but quite similar to other multi-line STATs
Interval is similar to Multi-Host analysis. To simplify, let's look at the last 100 measured data per disk (there are only a few).
Values Special Operations - You can analyze on a per disk basis, or SUM/AVG all of them, or group values by the first N characters of the disk name (very useful if you want to analyze I/O activity per controller), or when N is a negative number by the last N characters.
Example IOSTAT: Choose STAT Variables |
The data can be presented in three different forms:
Fine, here I want to see:
A Bookmark Links may be inserted at the bottom of every viewed graph. Clicking on one of these links will show you another statistics view for exactly the same time period.
Click "Start" !!
Example IOSTAT: Result Graph |
Some new things here.
Under the graph you'll see a list of Bookmark links. If you click on "CPU" (for ex.), a new graph will appear with the CPU activity during the same time period you're observing now. This is useful, because even 3 days later will still point to the same time slice.
You'll also find an "Add LOG-Massage" field, the same as with Multi-Host.
And a new one: Save Graph as Bookmark.
Save Graph as Bookmark... |
This is a really cool feature that will save you time. Right now, you can simply give short and long names for your graph view and save it as a new "Bookmark". Once this is done, all the options you selected will be saved (booked) under the name given. And instead of having to click again on all those checkboxes, to get similar data but for another time period or another STAT collect, all you will need to do is just click on the one button with your "Bookmark Name"!
NOTE : Since v.9.0 there is a possibility to create Bookmarks for Multi-Host Analyzing too! And all Multi-Host stats are Bookmarks since then ;-) -- To be able to create a "Multi-Host Bookmark" just keep in mind that when you're comparing several hosts you cannot bring on the same graph more than one statistic value on the same time! (for ex: you cannot see both Sys% and Usr% CPU usage on the same time without creating a mess in the graph legend, while you're using only Sys% or Usr% you'll need to show only host names within a legend) - so as far as you're generating a graph with a single statistic value and using only generic data filter conditions in the Bookmark form under your graph the choice will be automatically extended by "Multi-Host" option within a select box!
There is a huge benefit to use Bookmarks when you're analyzing many hosts on the same time and on the same graph, for ex:
As well, don't forget to share with others if you're creating new Bookmarks ;-))
Bookmarks |
Most of the bookmarks are pre-defined to save your time. Their number may vary from release to release, but never forget, you can always create your own and keep them as your specific kit. And you can easily move them from one base to another.
People very quickly are starting to use only bookmarks and then sometimes they are lost: "Oh, there is no way to see per network interface activity!" or, "no way to see a single process, only top-10!" But don't forget, all data is there, just go directly to the STAT interface and you'll find them. Then create new bookmarks covering other needs and you're all set.
- Rename
- Export
- Import
- Delete
Choose Collect and click on Bookmarks... |
Choose Time interval and Graphics style |
Select all Data you want to see and GO! |
Result Page |
Note: There were a lot of discussions about "Bookmark" as the name for this feature. And I'm quite agreeing that the term is not the best fit to describe the functionality, but the problem is I never received a new name that seemed to please everybody.
So, I've simply decided to put this term on the preferences page. This way, everybody is free to rename "Bookmark" to something else, even to "X-Files". :))
Administration actions |
From the "Main Page" you may go directly to the "Bookmarks" management page and
any Bookmark, as well as Restore the "Standard Kit". This is if you lost your bookmarks for any reason. The standard kit contains some of the more popular data views.
Multi-Host Extended Analyze |
Since v.8.5 the Extended Multi-Host Analyze was introduced - it combines the traditional Multi-Host options with per host Bookmarks. Probably the most sophisticated way now to analyze a server performance :-) but it gives you all the needed information grouped on the one single page :-) As well the Bookmarks links are also present now on demand - so at any time you may get a more detailed graphs while analyzing on the Multi-Host :-)
dim_STAT CLI |
I was really surprised by the strong demand by users for a dim_STAT CLI solution! It seems a Web interface is not making everybody happy :))
And here we are, with version 8.1 there is a CLI module in dim_STAT :)
# /apps/ADMIN/dim_STAT-CLI dim_STAT CLI v.1.7 Usage: dim_STAT-CLI [options] Options: -Base DBname -ID CollectID (if empty: prints available Collect list) -Stat Name (if empty: prints available Stat list) -Begin YYYYMMDDhhmiss -End YYYYMMDDhhmiss -Out fname optional: -Title graphtitle (if empty: uses Collect title) -Width size (if empty: uses default graph width) -Height size (if empty: uses default graph height) -AVG number (use average for too wide graphs) -Data filename save also raw stat data into file
For the moment it gives you a way to generate a single graph in PNG format for a given Database, CollectID and Time interval. Stat names are corresponding directly to your Bookmarks in your Database, so the more Bookmarks you have, the more graphs you may generate.
Since v.9.0 if you're using several Collect IDs on the same time (ID1,ID2,ID3,..) dim_STAT-CLI will propose you to use Multi-Host stats and draw Multi-Host graphs! ;-))
Example |
Check the STAT-collects in database 'EasyLux':
$ /apps/ADMIN/dim_STAT-CLI -Base EasyLux == Available Collect(s): ID Host Started Title -------------------------------------------------------------------------- 1 goldgate 1998-12-18 16:28:27 Demo collect, just to see it's ok! 2 x4100 2007-03-28 17:01:37 EasySTAT_TMG 4 galaxy3 2007-04-05 13:28:41 EasySTAT_CacheON -------------------------------------------------------------------------- dim_STAT CLI v.1.4 Usage: dim_STAT-CLI [options] Options: -Base DBname -ID CollectID (if empty: prints available Collect list) -Stat Name (if empty: prints available Stat list) -Begin YYYYMMDDhhmiss -End YYYYMMDDhhmiss -Out fname optional: -Title graphtitle (if empty: uses Collect title) -Width size (if empty: uses default graph width) -Height size (if empty: uses default graph height) ## ERROR: ## Not filled dim_STAT ID!
Get the available Stats for Collect #4:
$ /apps/ADMIN/dim_STAT-CLI -Base EasyLux -ID 4 == Available Stat(s): CPU -- CPU %Busy CPU_CrossCalls -- CPU Cross-Calls CPU_CtxSwitch -- CPU Context Switch CPU_ThMigration -- CPU Thread Migration FreeMEM -- Memory Free List(KB) I/O-KB/s -- I/O Activity KB/sec I/O-Op/s -- I/O Activity Operations/sec Net_Byte/s -- Network Bytes/sec Net_ByteALL/s -- Network SUM ALL Bytes/sec Net_Collis/s -- Network Collisions/sec Net_Error/s -- Network Errors/sec Net_Nocanput -- Network Nocanput Net_Pack/s -- Network Packets/sec Net_PackALL/s -- Network SUM ALL Packets/sec Paging -- Page In/Out (KB) PgScan -- Page Scanner Rate (Pg/sec) RunQueue -- Queued, Blocked, Swapped runnable processes SpinMtx -- Mutex Lock Spin/sec SpinRW -- Read/Write Lock Spin/sec SysCalls -- System Calls/sec Top10-BusyDisks -- Top-10 Busy% Disks Top10Busy_Actv -- Active Queue @Top-10 Busy% Disks Top10Busy_SrvTM -- Service Time @Top-10 Busy% Disks Top10Busy_Wait -- Wait Queue @Top-10 Busy% Disks Top10_ProcCPU -- Top-10 CPU% Usage @Process Top10_ProcNUMB -- Top-10 Active Processes Top10_ProcSysTM -- Top-10 CPU SysTime @Process Top10_ProcUsrTM -- Top-10 CPU UsrTime @Process Top10_SrvTime -- Top-10 High Service Time Disks dim_STAT CLI v.1.4 Usage: dim_STAT-CLI [options] Options: -Base DBname -ID CollectID (if empty: prints available Collect list) -Stat Name (if empty: prints available Stat list) -Begin YYYYMMDDhhmiss -End YYYYMMDDhhmiss -Out fname optional: -Title graphtitle (if empty: uses Collect title) -Width size (if empty: uses default graph width) -Height size (if empty: uses default graph height) ## ERROR: ## ## Empty Stat!
Get a CPU Usage graph from Collect #4 between 13:30 and 14:00.
$ /apps/ADMIN/dim_STAT-CLI -Base EasyLux -ID 4 -Stat CPU -Begin 20070405133000 -End 20070405140000 -Out CPU.png []==> CPU %Busy: EasySTAT_CacheON (galaxy3) $
Administration |
Several administration points were already covered in previous sections. Let's speak about some other, more oriented on day to day management...
- Days delay is purely by calendar! Recycle will delete all your data from the last collected day to N calendar days back, independent of possible inactivity holes in the collected data
- if no ID is given, only currently active collects will be recycled
- if a list of ID is given, all these collects will be recycled independently if they are active or not
- if ID is equal to ALL - all collects will be recycled independently if they are active or not
Active/Stopped Collect |
Each STAT-collect may be only in 2 states: Active or Stopped.
The state a collector is in is stored in the database. When the state of the collect is changed from the Web interface, the only action is an update of the corresponding database record, that's all. From time to time each collector checks its own record for changes, and if so, it takes corresponding action.
Since v.7.0 at any time any stopped collect may be restarted again.
Active : a collector gets data from the server via the STAT-service, and while the service is up, it continues to insert data into your database. If the STAT-service is down, it will trying to reconnect every 20 secs.
Stopped : the collect is stopped as well all the corresponding stat commands on the monitored server. No more data is inserted into the database.
Delete/Recycle Collects |
Finished collects can be completely removed from the database, or recycled. You may remove, for example, all data previously collected during the last N days. Actually, only manual recycling is possible.
Note: a delete operation frees space in the database index/data files, but it doesn't reduce the actual file size! Freed-up space will simply be reused for next collects.
Deleting a database was covered previously in "MySQL Admin Tips"...
Auto-Recycle |
Since v.8.1 an Auto-Recycle module is integrated into dim_STAT. Well, it still needs to be run from a cron job or another execution planner, but at least, once it's configured, it gives you a simple way to recycle your collected data automatically.
In your '/apps/ADMIN' directory you find the 'dim_STAT-Recycle' command:
# /apps/ADMIN/dim_STAT-Recycle Usage: dim_STAT-Recycle -Days N [-Base DBname] [-ID CollectID] -Days N -- keep data collected during last N days -Base DBname -- database name(s) (def: Default) -ID CollectID -- collect ids (ex: id1,id2,id3 or "ALL" for any ID) (def: All active collects only)
So, to recycle every 24 hours and to maintain in your database 'Prod' only data collected during the last 3 weeks, all you need to do is to add the following to the crontab on your dim_STAT server:
0 0 * * * /apps/ADMIN/dim_STAT-Recycle -Days 21 -Base Prod
NOTE :
Export/Import collects |
Collect Export and Import is an easy way to save/copy/restore small amounts of data in a compressed form. In case you need to copy a large amount of data, it is much faster to copy the whole database! (This was extensively covered in "MySQL Admin Tips".)
Modify Collect parameters |
You should be VERY CAREFUL with these actions!
Changing the Title and Hostname are just for decoration. :))
Changing Collect-ID, which is a global operation, will lock all corresponding tables, while making modifications.
Changing Time Interval makes only sense with wrongly loaded data from output files. Be aware that you're changing your time scale and will loose synchronization with real world events.
Changing Start Time can be used when you want to compare similar workloads, that were collected on different periods. You can bring them onto the same time scale and then analyze via Multi-Host mode. However, if you have any LOG messages corresponding to the same collect, then don't forget to move them also in time to keep timestamp synchronization.
LOG Messages operations |
This can be used in case there are too many messages, or that you want to share them with other collects, or when you want to move them slightly in time, etc. You can do all of that and much more via "LOG Messages Admin".
Add-On Statistics |
One of the most powerful features of dim_STAT is the ability to integrate your own statistic programs with the tool. Once added, they will be considered by dim_STAT as being the same as the standard set of STAT(s) and give you the same kind of service: Online Monitoring, Up-Loading, Analyzing, Reporting, etc.
However, the choice of external stat programs is so wide that it's quite impossible to design a wrapper for each and every format. Therefore, I've decided to limit the input recognizer to just 2 formats (which covers maybe 95% of needs) and leave it to you to write, if necessary, your own wrapper and modify the output to one of the supported formats.
Formats supported by dim_STAT:
- SINGLE-Line: with one output line per measurement (ex: vmstat)
- MULTI-Line: with several output lines per measurement (ex: iostat)
To be correctly interpreted, your stat program should produce a stable output. This means the same format for data lines, at least one line in case of MULTI, keep the time-out interval constant, etc. Lines not containing data have to be declared, so that they can be ignored by dim_STAT.
NOTE: lines shorter than 4 characters are considered as "spam" and will be ignored!
Let's look at some examples...
- During execution of sar %i will be replaced with the time interval in seconds.
- The command name doesn't matter here because it is only used as an alias for STAT-service. Have a look at the "access" file section, it's possible to name the shell command "toto" and put in it /usr/bin/sar as an alias.
- ColumnName - leave it as it is, if you don't need to access the database directly. Note: there are 2 reserved columns for Collect-ID and measurement No.
- Data Type - if you're not sure, set it to "Float", otherwise it will be "Int"
- Column# on input - in our case we need columns 4 and 7
- Short Name - single word descriptions, here %rcache and %wcache
- Full Name - description to be used where detailed information is needed
- Use in Multi-Host - if you choose "Yes" the corresponding value will be automatically enabled in Multi-Host mode for analyzing of several hosts at once.
- Line Separator pattern: this is by default "new-line", but in some cases it can be a header (like iostat)
- Attribute Column: very important! As you have several lines per measure you need to distinct these by something (like the "diskname" column in iostat).
- Use In Multi-Host: is more than simply Yes/No, you should use SUM and/or AVG for collected values.
- Single-Line
- name: AppStats
- 1 column
- shell command: "AppStats %i"
- value: integer, 1st position, name: TPS
- ProcLOAD: all output information on-the-fly summarized by process name
- UserLOAD: all output information on-the-fly summarized by user name
- ZoneLOAD : all output information on-the-fly grouped by zone id
- ProjLOAD : the same, but grouped by project id
- TaskLOAD : the same, but grouped by task id
- PoolLOAD : the same, but grouped by pool id
- N_total -- current number of all processes running within a zone
- N_activ -- current number of processes being *activewithin a zone per a given time period
- UsrCPU -- total User CPU *timeconsumed within a zone per a given time period
- SysCPU -- total System CPU *timeconsumed within a zone per a given time period
- CPU% -- percent of CPU Busy% within a zone - this value will depend on were or not some CPU assigned to the zone, so it's still better to monitor a CPU% usage within a zone via "vmstat" command!
- VSize -- total "virtual memory size" in KB of all processes running within a zone (be aware each process within its VSZ value may already include several shared libraries or shared memory segments (SHM), and these *same* shared objects may be accounted several times within a total VSize...
Currently there is no any "simple" way to say you how much memory is used by a group of processes (for ex. Oracle processes, etc.) - even there is still possible to write a script which will account each shared object only once, such script will use a significant amount of CPU time..
So, nobody is perfect, but there is a room for improvement! :-)) - SysCalls -- total number of all system calls/sec within a zone
- N_lwp -- current number of LWP (kernel threads) running within a zone
- Vol_CTX -- total number of all volоntary context switch/sec within a zone
- InVol_CTX -- total number of all involоntary context switch/sec within a zone
- Sigs -- total number of all signals/sec within a zone
- I_Blks -- total number of all input I/O blocks/sec within a zone
- O_Blks -- total number of all output I/O blocks/sec within a zone
- IO_Chrs -- total number of all I/O character operations/sec within a zone
- NOTE : by default HAR add-on is disabled within a Solaris STAT-service, why? - to get a CPU counters data Solaris library functions requiring an exclusive access to the chip - for a very short time, but exclusive anyway - so any other process running on the requesting CPU will be moved to another CPU and get some unwanted side effects.. That's why I'm not suggesting to run HAR for a long period on your production system until you're not fully understanding how it works..
- oraIO : Oracle I/O stats for data/temp files
- oraEXEC : Oracle SQL QueryExecutions/sec, Commits/sec, Number of Sessions
- oraLATCH : Oracle latch stats
- oraSLEEP : Oracle latch sleeps stats
- oraENQ : Oracle enqueue stats
- current value of a variable
- delta between current and previous value
- value of delta/sec
- On -- MySQL Server On-Line flag (0 or 1)
- Sessions -- number of currently connected user sessions (threads)
- InnDirty -- amount of dirty pages in InnoDB
- InnoFree -- amount of free pages in InnoDB
- KeyDirty -- amount of dirty pages in MyISAM Key buffer
- OpFiles -- number of currently open files
- OpTables -- number of currently open tables
- ByteRx/s -- received bytes/sec via network
- ByteTx/s -- sent bytes/sec via network
- Commit/s -- number of COMMIT requests/sec
- Delete/s -- number of DELETE requests/sec
- Insert/s -- number of INSERT requests/sec
- Select/s -- number of SELECT requests/sec
- Update/s -- number of UPDATE requests/sec
- InnDsy/s -- InnoDB Data Sync/sec
- InnDrd/s -- InnoDB Data Read/sec
- InnDwr/s -- InnoDB Data Write/sec
- InnLwr/s -- InnoDB Log Write/sec
- InnLsy/s -- InnoDB Log Sync/sec
- Key_Rd/s -- MyISAM Key Read/sec
- Key_Wr/s -- MyISAM Key Write/sec
- Query/s -- Query/sec execution
- AbrtClnt -- aborted clients (delta)
- AbrtConn -- aborted connections (delta)
- Connects -- number of recent connects (delta)
- SlowReqs -- number of slow requests (delta)
- TabLckWt -- table lock waits (delta)
- Rollback -- called rollbacks (delta)
- current value of a variable
- delta between current and previous value
- value of delta/sec
- some values are also presented per database name
- On -- Server On-Line flag (1/0)
- Sessions -- number currently connected user sessions (backends)
- Commit/s -- number of executed COMMITs/sec
- Rollback -- number of executed rollbacks (delta)
- B_Read/s -- Block reads/sec
- B_hit/s -- Block read hit/sec
- RowSnd/s -- Rows sent/sec
- RowFch/s -- Rows fetched/sec
- RowIns/s -- Rows inserted/sec
- RowUpd/s -- Rows updated/sec
- RowDel/s -- Rows deleted/sec
- ChpTimed -- Checkpoints involved by timeout (delta)
- ChptReqs -- Checkpoints involved by request (delta) - probably out of checkpoint segments
- BuffChpt -- Buffers written by checkpoint (delta)
- BufClean -- Buffers cleaned by background writer (delta)
- MxWClean -- number of times Max Written level was reached by background writer (delta)
- BufBkend -- Buffers written by backends (delta)
- BufAlloc -- Allocated buffers (delta)
- edit the /etc/STATsrv/bin/jvmSTAT.sh file (from STAT-service) on each client machine, to set the right path environment for JAVA_HOME pointed to the jdk 1.5 home. (ex: JAVA_HOME=/usr/jdk15)
- enable jvmSTAT in STAT-service on each client (uncomment jvmSTAT in /etc/STATsrv/access file)
- before starting any new collect, including jvmSTAT, be sure that the jvmSTAT Add-On is already installed (Add-On interface from Main Page)
- LvmSTAT (Linux vmstat)
- LcpuSTAT (Linux mpstat)
- LioSTAT (Linux iostat)
- LnetLOAD (Linux netLOAD)
- LpsSTAT (Linux psSTAT)
- LprcLOAD (Linux ProcLOAD)
- LusrLOAD (Linux UserLOAD)
Example of SINGLE-Line command integration |
Let's assume we want to monitor a read/write cache hit on the system. This information can be retrieved using "sar":
$ sar -b 1 1000000000000000 SunOS sting 5.9 Generic_112233-05 sun4u 07/09/2004 18:10:13 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s 18:10:14 0 1 100 0 0 100 0 0 18:10:15 0 14 100 0 0 100 0 0 18:10:16 0 7 100 0 0 100 0 0 18:10:17 0 0 100 0 0 100 0 0 18:10:18 0 0 100 0 0 100 0 0 18:10:19 0 135 100 0 0 100 0 0 18:10:20 0 0 100 0 0 100 0 0 18:10:21 0 69 100 0 2 100 0 0 18:10:22 0 86 100 0 2 100 0 0 18:10:23 0 0 100 0 0 100 0 0 18:10:24 0 0 100 0 0 100 0 0 18:10:25 0 0 100 0 0 100 0 0 ...
What we are interested in are the "4"-th and "7"-th columns from the sar output, and ignoring any lines containing "*SunOS*" or "*read*".
Folowing the "Integrate New Add-On-STAT" link:
Step 1: FIRST INFO |
Let's give the new Add-On the name CacheHIT.
We need only 2 columns from the output line (4th and 7th value). This is a "Single-Line" output...
Click on "New"...
Step 2: INTEGRATION |
During this step we need to explain what we want to run and which information we'll need:
Description: CacheHIT via SAR
Shell Command: sar -b %i 1000000000000000
Ignore Lines: we should ignore any lines containing "*SunOS*" or "*read*"
Data Descriptions:
Create!!
Created! |
What's Next? Will it work now?
Yes! IF YOU DID NOT FORGET to give your STAT-service access to this new command! This is a very common error.
If you want to collect "CacheHIT" data from server "S" be sure that the STAT-service on "S" is given execution permissions for the "sar" command. Add the following lines to your /etc/STATsrv/access file:
# CacheHIT Add-On command sar /usr/sbin/sar #
And now it'll work! :-))
NOTE: for security reasons and for a cleaner "stat to command" relationship, it is preferable to create for our new add-on a specific script 'CacheHIT.sh', and then use that instead of the direct access to the 'sar' command.
Example:
$ cat /etc/STATsrv/bin/CacheHIT.sh #bin/ksh exec /usr/sbin/sar -b $1 1000000000000000 $ CacheHIT.sh 5 ... $ tail -3 /etc/STATsrv/access # CacheHIT Add-On command CacheHIT /etc/STATsrv/bin/CacheHIT.sh #
And the Add-On shell command needs to be changed to: "CacheHIT %i"
Anti-Spam Filter |
IMPORTANT: There is an anti-spam filter feature, that is always active during data collecting. It rejects any input line having shorter than 4 characters in length. If your newly made stat command prints only one small column of numbers, you need to add leading spaces to take care that the data is accepted by dim_STAT.
MULTI-Line Add-On command integration |
Multi-Line integration is quite similar to Single-Line, except few additional things:
REAL LIFE EXAMPLE... |
To probably even better feel a new Add-On integration process in dim_STAT, let me tell you a one real life story happened this year with one of our customers..
So well, once understood with dim_STAT what goes on the system and storage, customer also decided to bring more light on what is going wrong (or well) on their application too (finally)..
Initially they wrote a lot of debug messages into their log files, but nothing useful really to understand what's going wrong.. Also, more data they wrote to the log files - more slower worked application :-) normal, no? So, as the first step they simplified logging and got a single file: /var/tmp/appstats.log. Every N seconds a new line was added into this file and containing just 3 numbers, and the las one (we're interested in) is an avg TPS during the last time period (M seconds (bigger vs N)):
# tail -5 /var/tmp/appstats.log 10:17 5 20 10:20 7 30 10:23 2 50 10:26 8 30 10:30 1 10 #
And then customer is creating a simple monitoring script AppStats.sh:
# AppStats.sh 5 10 50 40 20 30 ^C #
In few minutes customer integrated this new stat command as dim_STAT Add-On, but... 15 minutes later it still did not collect any data...
WHY?...
Common Error #1 |
The first problem: the output line is very short! and lines shorter than 4 characters are ignored by anti-spam filter (as mentioned before)! All we need is just to add 3 blank characters in the begin of the line.
Let' get a look on the script source:
#bin/bash #================================================ # AppStats #================================================ while true do v tail -1 /var/tmp/appstats.log sleep $1 done | awk '{ printf( "%d\n", $3 ) }' #================================================
Just add 4 spaces into {printf( "%d\n", $3 )} before %d and it'll be ok!
#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log sleep $1 done | awk '{ printf( " %d\n", $3 ) }' #================================================
The script output now is:
# AppStats.sh 5 10 50 40 20 30 ^C #
Common Error #2 |
But that's not all! It'll still not work!...
Why?.. - the output of this script is not regular yet!...
To check it (as well with any other script) just execute it in the same way but piped to the 'more':
# AppStats.sh 5 | more
... 10 minutes later there will be still no any output!... - and it exactly what's happening when STAT-service is trying to send data to the dim_STAT server via process pipe...
What is wrong here?.. - the problem is inside of the script its output is self-piped into 'awk' program, and 'awk' itself is not flushing its output - data will stay buffered until the whole 'awk' buffer is not filled.. and only then data will be flushed to the pipe...
How to fix it?.. - add fflush instruction into the script (depending on 'awk' version) - change the script in way to have 'awk' call inside of the loop
Updated script :
#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log | awk '{ printf( " %d\n", $3 ) }' sleep $1 done #================================================
As 'awk' is finished on each loop passing, data will be always flushed and entered into the pipe with each iteration.
Continue improvement... |
So well, customer copied the new script into /etc/STATsrv/bin on all needed servers and added into the end their /etc/STATsrv/access files:
# AppStats add-on command AppStats /etc/STATsrv/bin/AppStats.sh
On the dim_STAT the Add-On was integrated as:
And we started to collect some first data...
Within first 40 minutes, once customer fully enjoyed to graph their application TPS levels, one of the developers said it will be fine to see on the same time an avg response time!.. And within one hour they extended their log file line with additional value showing avg RespTM.
The new script showing one value more:
#bin/bash #================================================ # AppStats #================================================ while true do tail -1 /var/tmp/appstats.log | awk '{ printf( " %d %d\n", $3, $4 ) }' sleep $1 done #================================================
And we reintegrated again the same script but describing now 2 columns from output. And it worked just fine!..
Should I say during the next few hours they already wanted to add 3 other new columns! :-))
And finally... |
Finally it was hard for developers to decide how many stat values they will need on each server, because it depends on application deployment as well on server role.. So, they understood hos to extend their script with any other values, but preferred to avoid Add-On integration step every time they added a new value into their log file..
Well.. nothing impossible :-)
The only way to have "dynamic" stat list is to improve AppStats script in way it working like a Multi-Line stat command (like 'iostat' may show more or less disks according your server configuration)..
The idea is simple, this output:
# AppStats.sh 5 TPS AvgTM Users Active 30 20 200 40 40 20 200 50 ^C #
into multi-line:
# AppStats.sh 5 Name Value TPS 30 AvgTM 20 Users 200 Active 40 Name Value TPS 40 AvgTM 20 Users 200 Active 50 ^C #
And according to needs, log file may contain on the same time the value names, as well values itself:
# tail -2 /var/tmp/appstats.log 11:12 33 TPS 30 AvgTM 20 Users 200 Active 40 11:22 33 TPS 40 AvgTM 20 Users 200 Active 50
The new script version:
#bin/bash #================================================ # AppStats #================================================ while true do echo " Name Value" tail -1 /var/tmp/appstats.log | awk '{ printf( " %-8s %3d\n %-8s %3d\n %-8s %3d\n\n", $3, $4, $5, $6, $7, $8 ) }' sleep $1 done #================================================
This scrips may be integrated now as Multi-Line Add-On, having 2 columns on the output... And even if script will be extended again with other values - they will just extend a list of lines with names and values.
Pre-Integrated Add-Ons |
To make your life easier, there are several additional already pre-integrated stat programs (Oracle, Java, Linux, etc).
They are all already installed by default in your dim_STAT server, BUT! not all of them enabled in your STAT-service by default - only commands not needing any additional checking are enabled!...
As a rule, check first if the add-on works correctly, by starting it directly from the STAT-service bin-directory on the client side (/etc/STATsrv/bin), and only then enable it via access file (usually a simple uncomment in /etc/STATsrv/access)...
ProcLOAD / UserLOAD |
There are 2 additional psSTAT wrappers:
These stats are very useful when you have hundreds or thousands of running processes and you want to study groups of processes or users, instead of the activity of a single process.
Example of output :
# /etc/STATsrv/bin/ProcLOAD.sh 5 PNAME NTOT NACT UsrTM SysTM %CPU VSZ SYSC NLWP VCTX ICTX SIGS InputBLK OutputBLK I/O_CHR STATcmd 312 58 0.00 0.00 0.0 594112 1472 312 180 2 0 0 0 198874 WebX.mySQL 312 58 0.70 0.04 3.4 1142968 8307 312 1066 82 0 0 0 398649 fsflush 1 1 0.00 0.03 0.4 0 0 1 7 2 0 0 155 0 httpd 7 1 0.00 0.00 0.0 18008 10 7 14 0 0 0 0 0 in.rlogind 1 0 0.00 0.00 0.0 2240 0 1 0 0 0 0 0 0 inetd 1 1 0.00 0.00 0.0 5304 1 4 4 0 0 0 0 0 init 1 0 0.00 0.00 0.0 2400 0 1 0 0 0 0 0 0 java 2 2 0.00 0.00 0.1 455448 255 50 413 1 0 0 0 12 mysqld 1 1 0.24 0.12 2.0 62216 21258 315 1058 30 0 0 342 4448475 nfs4cbd 1 0 0.00 0.00 0.0 2360 0 2 0 0 0 0 0 0 picld 1 1 0.00 0.00 0.0 4632 33 6 3 0 0 0 0 0 psSTAT64 1 1 0.02 0.08 0.3 5856 5006 1 3 2 0 0 0 3146 rpcbind 1 0 0.00 0.00 0.0 2880 0 1 0 0 0 0 0 0 sendmail 2 1 0.00 0.00 0.0 15456 10 2 3 0 0 0 0 0 svc.startd 1 1 0.00 0.00 0.0 10200 9 13 4 0 0 0 0 672 syseventd 1 0 0.00 0.00 0.0 2552 0 14 0 0 0 0 0 0 ttymon 2 0 0.00 0.00 0.0 4648 0 2 0 0 0 0 0 0 utmpd 1 1 0.00 0.00 0.0 1280 0 1 1 0 0 0 0 0 vold 1 0 0.00 0.00 0.0 2912 0 6 0 0 0 0 0 0 wrapper-solari 1 1 0.00 0.00 0.1 3040 237 2 168 2 0 0 0 0 xntpd 1 1 0.00 0.00 0.0 2320 25 1 5 0 5 0 0 0 ypbind 1 0 0.00 0.00 0.0 2360 0 1 0 0 0 0 0 0 ^C
Special Solaris 10: ZoneLOAD / PoolLOAD/ TaskLOAD/ ProjLOAD |
Four psSTAT_10 wrappers were added, that are specific to Solaris 10 and later:
These stats give you more extended information comparing to the standard 'prstat'.
Following some more details about output columns (given for ZoneLOAD, but valid for others too :-))
ZoneLOAD.sh - a shell script wrapper for psSTAT command to collect all data pre-grouped per Solaris Zone (psSTAT option: -M zone). Description of values printed per zone (each value is printed per a given time period):
The last 3 values are very curious :-) because on time I've needed it I did not find any document describing what they are meaning, so I've based my naming on the description given within a /proc structure header files - these values are helping in some cases without involving any DTrace script to understand which process (or Zone in the current case) is doing more I/O operations than others...
netLOAD |
The netLOAD wrapper is to monitor Solaris network activity. This tool is already for a long time included into dim_STAT's STAT-service. And since v.8.0, netLOAD monitors all network interfaces present in the system (including virtual and loopback). If some indicators are not populated by device drivers, a '-1' value is presented instead. Also, a new '-I' option is added: You may give a fixed list of network interfaces you want to monitor (run '/etc/STATsrv/bin/netLOAD' for more details). In STAT-service, netLOAD is integrated via a 'netLOAD.sh' script, to provide an easy way to change an option.
Example of output :
# /etc/STATsrv/bin/netLOAD.sh 5 Name IBytes/s OBytes/s Ipack/s Opack/s Ierr/s Oerr/s Col/s Bytes/s Pack/s Nocanput lo0 -1.0 -1.0 0.4 0.4 0.0 0.0 0.0 0.0 0.8 0 ce0 26300.6 3840.0 105.2 64.0 0.0 0.0 0.0 30140.6 169.2 0 ce1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 Name IBytes/s OBytes/s Ipack/s Opack/s Ierr/s Oerr/s Col/s Bytes/s Pack/s Nocanput lo0 -1.0 -1.0 0.8 0.8 0.0 0.0 0.0 0.0 1.6 0 ce0 27624.4 2688.0 77.2 44.8 0.0 0.0 0.0 30312.4 122.0 0 ce1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0
UDPstat |
The UDPstat is a wrapper around of "netstat -s" command on Solaris, and made to monitor a UDP traffic on the system. While it's printing all main counters (In/Out traffic, In/Out errors), it's particularly interesting to analyze Input Overflows (and Input Checksums as well). option.
Example of output :
# /etc/STATsrv/bin/UDPstat.sh 5 UDP-stat Tot# Delta Val/s udpInDatagrams 65700 0 0.00 udpInErrors 0 0 0.00 udpOutDatagrams 68321 0 0.00 udpOutErrors 0 0 0.00 udpNoPorts 3514281 0 0.00 udpInCksumErrs 0 0 0.00 udpInOverflows 0 0 0.00 none 0 0 0 UDP-stat Tot# Delta Val/s udpInDatagrams 65900 200 40.00 udpInErrors 0 0 0.00 udpOutDatagrams 68321 0 0.00 udpOutErrors 0 0 0.00 udpNoPorts 3514281 0 0.00 udpInCksumErrs 0 0 0.00 udpInOverflows 0 0 0.00 none 0 0 0
HAR |
HAR - is the Hardware Activity Reporter tool for Solaris 8 and up. Starting with Solaris 8, Sun had begun to deliver public interfaces for the SPARC and x86 hardware performance counters --libcpc, to access CPU counters and libpctx, to track a process. HAR differs from other tools in the fact that it combines the low-level counts into higher-level metrics more useful to application programmers. Application programmers are typically interested in the following metrics: CPI, FLOPS, MIPS, address bus percentage utilization, cache miss rates, branch and branch miss rates, and stall rates. These metrics help in assessing the fair usage of available processing units, locating bottlenecks and guiding tuning efforts, when needed...
Check this valuable article to discover everything about this powerful tool!..
Oracle Add-Ons |
NOTE : Originally all these scripts were made as examples to show how easily we may collect data even from Oracle. But with a time people started to use them more and more (while I still expected, inspired by examples, they'll add something more optimal :-)). For example, current scripts all the time connecting/disconnecting to/from the database, and collector keeping connection opened will be more optimal, etc... But well - it's still better then nothing! :-))
Anyway, all following wrappers are needing a correctly setting of Oracle environment for the "Oracle" user. By default the user's name is oracle , but it may be changed inside of the scripts.
It means that:
should work correctly and give you a SQL> prompt for the right database instance.# su - oracle -c "sqlplus /nolog"
Then you may check that:
prints you the current number of Oracle sessions and current exec/commit activity.# /etc/STATsrv/oraEXEC.sh 5
If it doesn't work - fix it before to go further :-)) (BTW, there is a dim_STAT user group where you may always ask questions -http://groups.google.com/group/dimstat )
Oracle Add-Ons:
By default all these Add-Ons are already enabled within dim_STAT database, and all you need is just to uncomment them within a STAT-service access file (/etc/STATsrv/access) and start a new collect including Oracle stats :-))
And of course you may add any other one. Some people even collect statspack reports directly into dim_STAT!
MySQL Add-Ons |
mysqlSTAT - is monitoring a "show status" output. Each output variable is presented with 3 values:
And it's up to you to choose from the list of variables what kind of information you're interesting in :-) To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/mysqlSTAT.sh file to setup user/password and host/port information.
mysqlLOAD - is oriented multi-host monitoring and presenting a compact list of data from "show status" output:
This add-on also needs to be configured to work properly - edit your /etc/STATsrv/bin/mysqlSTAT.sh file to setup user/password and host/port information.
innodbSTAT - is monitoring a "show innodb status" output (or "show engine innodb status" since MySQL 5.5). Working similar to "mysqlSTAT", but list of variables is based on InnoDB status only. To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/innodbSTAT.sh file to setup user/password and host/port information.
innodbMUTEX - is monitoring a "show mutex status" output (or "show engine innodb mutex" since MySQL 5.5). Printing the InnoDB MUTEX related stats, already ready to print not only "waits" (as a standard), but also more detailed data (available via compiling of InnoDB with debug options or just hacking (like counters, spins, real waited time on each mutex, etc.)). To work properly this add-on needs to be configured - edit your /etc/STATsrv/bin/innodbMUTEX.sh file to setup user/password and host/port information.
Example of output :
# /etc/STATsrv/bin/innodbMUTEX.sh 5 MUTEX count count/s spin_waits spin_waits/s spin_rounds spin_rounds/s os_waits os_waits/s os_yields os_yields/s os_wait_times os_wait_times/s db-server-online 1 1 1 1 1 1 1 1 1 1 1 1 buf/buf0buf.c:1122 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 fil/fil0fil.c:1535 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 srv/srv0srv.c:973 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 combined_buf/buf0buf.c:818 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 log/log0log.c:830 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 btr/btr0sea.c:181 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 combined_buf/buf0buf.c:820 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 MUTEX count count/s spin_waits spin_waits/s spin_rounds spin_rounds/s os_waits os_waits/s os_yields os_yields/s os_wait_times os_wait_times/s db-server-online 1 1 1 1 1 1 1 1 1 1 1 1 buf/buf0buf.c:1122 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 fil/fil0fil.c:1535 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 srv/srv0srv.c:973 -1 -1 -1 -1 -1 -1 2411 482.200012 -1 -1 -1 -1 combined_buf/buf0buf.c:818 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 log/log0log.c:830 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 btr/btr0sea.c:181 -1 -1 -1 -1 -1 -1 411 82.199997 -1 -1 -1 -1 combined_buf/buf0buf.c:820 -1 -1 -1 -1 -1 -1 0 0.000000 -1 -1 -1 -1 ^C
NOTE: the -1 is printed if information is not available.
innodbIOSTAT (deprecated, works only with old InnoDB) - is an adoption of DTrace script published by Neel but with one additional feature: it detects automatically if mysqld is not running anymore or started/restarted again. And of course you may run it only on the system supporting DTrace :-)
PostgreSQL Add-Ons |
pgsqlSTAT is monitoring a "pg_stat_bgwriter" and "pg_stat_database" output. Each output variable is presented with 3 values:
And it's up to you to choose from the list of variables what kind of information you're interesting in. To work properly this add-on need to be configured - edit /etc/STATsrv/bin/pgsqlSTAT.sh file to setup user/password and host/port information.
pgsqlLOAD is oriented multi-host monitoring and presenting a compact summary (single line) from "pg_stat_bgwriter" and "pg_stat_database" output:
Please, read an excellent howto written by Greg Smith to see how analyze this data -http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
To work properly this add-on also need to be configured - edit /etc/STATsrv/bin/pgsqlLOAD.sh file to setup user/password and host/port information.
jvmSTAT |
This is a wrapper to bring information from the "jvmstat" package. This jvmstat is now officially integrated with the JVM 1.5 distribution or later (and called "jstat" now). The jvmSTAT wrapper is giving a way to monitor ALL running JVMs on your server on the same time!
To run jvmSTAT properly you need first of all to have jdk 1.5 (or later) installed on your host and check it works correctly on your server:
# cd /usr/jdk15/bin # jps ... #
If you don't see your running JVM(s) within "jps" output - try to fix it first before continue on next steps :-) - normally it should work with any JVM since Java version 1.4.2.
To get the 'jvmSTAT.sh' wrapper working:
Then start to collect JvmSTAT data :-)
jvmGC |
This one still exists, but I don't see any reason why anyone would still use it, jvmSTAT is the better solution for any kind of "GC" collection.
This wrapper collects on-the-fly information about GC (garbage collector) activity of any JVM running with the "-verbose:gc" option. Before JVM 1.4.2 the only possible way to get information on the GC activity of the standard JVM was dump of the log output, so this wrapper is simply based on log file scanning.
Usage: If you want to see GC activity of one of your JVMs, running on server "J".
0) Install "jvmGC" via the Add-Ons page.
1.) jvmGC uses the $LOG file for data input (you may change name and permissions according to your needs (default filename: /var/tmp/jvm.log), modify if needs on the server "J" STAT-service side (/etc/STATsrv/bin).
2) use the web interface to start the collect including "jvmGC"
3) on server "J" add the "-verbose:gc" option to java in your starting application script and redirect output into the application log file (for ex. app.log)
4) once you want to monitor your JVM:
$ tail -f app.log | /etc/STATsrv/bin/grepX GC >> /var/tmp/jvm.log
5) observe jvmGC output data and have fun!
LINUX specific STATs |
Linux Add-Ons:
For details, see the following special Linux note...
Administation tasks |
At any moment you can:
Edit Add-On Description - in case you make a mistake in any value name, or in a shell command corresponding to your Add-On you may quickly repair it via Edit interface (however you cannot change anymore MySQL table column names or datatypes - if the error was here, you're better to recreate this Add-On one again ;-))
Save Add-On Description - this will give you an ASCII text file which may be reused for another database. This way you may share with others any new findings and any new tools you found useful!
Restore Add-On Description - from information on a given Description file, re-create all Add-On required database structures and fill all information required for it to function correctly. WARNING: if you're already using the same Add-On in the current database, all previous data will be destroyed!
Delete Add-On - removes the Add-On and all corresponding data from the current database...
Linux Special Notes |
I don't know if it will surprise you that all dim_STAT binaries for Solaris SPARC until now were compiled on the same old and legendary SPARCstation-5, which runs Solaris 2.6 and that they still work on every next generation Sun SPARC machines. This includes the last generation, and Solaris 10. Some unchanged binaries are still here and are even 10 years old! This is calling a TRUE binary compatibility! :))
Now, can I say the same thing about Linux??? Sometimes, even the same vendor breaks binary compatibility between previous and next distributions!
Because the main problem lies with the different implementations of shared libraries, I've recompiled all main dim_STAT programs as static binaries to be sure they will run on every distribution. Over time, things got worse: static binaries are core dumping on some distros. Therefore, the current dim_STAT Linux version ships with both dynamic and several static versions of the same binary generated on the different distros.
dim_STAT reported to work out-of-the-box on MEPIS 3.3.1-1, MEPIS 6.0/7.0, Debian 3/4, RHEL 4.x/5.x, CentOS 4.x/5.x, OEL 5.x/6.x, SuSE 9/10/11/12, Fedora Core. Anyway, if you encounter any problems during installation or execution of dim_STAT, please, contact me directly and we'll try to fix the issue together. Last years many Linux vendors have stopped even to ship system libraries to run 32bit programs on their 64bit distributions.. - keep it in mind if you're planning to install dim_STAT on a 64bit Linux, you may will need to add 32bit packages then like: glibc.i686 / libc6-i386, libzip.i686/ lib32z1, libX11, libssl, libcrypto, libpng12, libjpeg, .. (check for some discussions on the dim_STAT Users Group @Google: http://groups.google.com/group/dimstat )
NOTE: PC boxes are quite cheap nowadays. So rather than trying to fix issue after issue, ask yourself if buying a $300 PC, installing MEPIS-6.0 or openSUSE-11.2 32bit on it (10 minutes), installing dim_STAT (5 minutes) and starting the collection of stats from all your servers, will not be a cheaper, easier and simpler solution.
And Again: why you simply don't use Solaris/OpenSolaris and just avoid all such kind of problems?... :-) There is even Pocket Solaris available (http://milax.org) - 300MB full install + 60MB dim_STAT = all other disk space to use securely with ZFS and collect data from your servers!... Seriously...
- LpsSTAT - process stat using 'ProcName-PID' pair as unique process reference (mode: ref)
- LPrcLOAD - grouped by process name activity stats (mode: proc)
- LUsrLOAD - grouped by user name activity stats (mode: user)
Linux STAT-service |
While there is in general no problem with the stat programs for Solaris, there are always a lot of questions about Linux stats integration.
Keep in mind: The most important part of collecting stats from a Linux box is a working STAT-service! If it starts on your box, you may integrate _any_ existing or new stat commands (there are many, many available on the internet).
Pre-integrated stats are already coming with the STATsrv-Lux.tgz package. It doesn't mean it will work on your system at once (linux distribution compatibility is always an issue). Some of them I got from the 'sysstat' kit and were recompiled on MEPIS 6.0. If required, you may recompile them yourself, these stat programs are coming from sysstat (http://perso.wanadoo.fr/sebastien.godard/). And some I developed myself, as I was tired of seeing different outputs on different distros, even with standard commands like 'vmstat'! Therefore, the STAT-service is shipping with its own vmstat, netLOAD and psSTAT!
Wrappers may be needed for some stat commands to skip unused information or just transform input data into the form expected. The following commands already have wrappers and are pre-integrated into the packaged STAT-service.
NOTE: sometimes the same command gives a different output on different Linux distribution! Be ready to create in this case new Add-Ons or to create common wrappers to adapt command output.
Lvmstat |
Source: the Linux "vmstat", as shipped with STAT-service since v.8.0
Output example :
dim$ /etc/STATsrv/bin/vmstat 1 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 434384 691948 9708 220592 3 4 32 28 36 47 3 1 95 1 0 0 434384 691948 9708 220592 0 0 0 0 347 913 2 0 98 0 0 0 434384 691948 9708 220592 0 0 0 0 396 1083 2 1 97 0 dim$
A wrapper is not needed anymore. On all systems, the same output is guaranteed (if it runs ;-)).
Lmpstat |
Per CPU detailed usage statistics.
Source: the Linux "mpstat" v2 (improved) from Sysstat, and shipped with STAT-service since v.8.3
Output example :
# /etc/STATsrv/bin/Lmpstat.sh 5 09:44:12 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 09:44:17 all 4.57 0.00 1.12 1.52 0.10 0.00 0.00 92.69 182.60 09:44:17 0 3.81 0.00 1.20 2.00 0.00 0.00 0.00 92.99 109.40 09:44:17 1 5.59 0.00 0.62 0.83 0.00 0.00 0.00 92.96 1.40 09:44:17 CPU %user %nice %sys %iowait %irq %soft %steal %idle intr/s 09:44:22 all 1.65 0.00 0.68 0.00 0.00 0.00 0.00 97.68 145.40 09:44:22 0 1.80 0.00 1.00 0.00 0.00 0.00 0.00 97.19 95.60 09:44:22 1 1.32 0.00 0.38 0.00 0.00 0.00 0.00 98.31 2.20 ^C
LcpuSTAT (deprecated) |
The source: "mpstat" from Sysstat
Output example :
A wrapper is not really needed, but simplifies usage. Just ignore the "*Linux*||*CPU*||" lines and use "*all*" as a separator.# /etc/STATsrv/bin/cpuSTAT.sh 1 Linux 2.6.15-26-386 (dimitri) 11/16/06 16:45:15 CPU %user %nice %system %idle intr/s 16:45:16 all 0.00 0.00 0.00 100.00 115.00 16:45:16 0 0.00 0.00 0.00 100.00 115.00 16:45:17 all 1.00 0.00 0.00 99.00 147.00 16:45:17 0 1.00 0.00 0.00 99.00 147.00 16:45:18 all 0.00 0.00 0.00 100.00 162.00 16:45:18 0 0.00 0.00 0.00 100.00 162.00 ^C #
Deprecated (on some systems may show over 100% values :-) - better to use Lmpstat now).
LioSTAT |
Source: "iostat" from Sysstat
Output example :
# /etc/STATsrv/bin/ioSTAT.sh 5 Device: rrqm/s wrqm/s r/s w/s op/s rsec/s wsec/s rkB/s wkB/s kB/s avgrq-sz avgqu-sz await svctm %busy sdb 0.00 515.90 17.81 88.83 106.65 1286.49 9897.66 643.24 4948.83 5592.07 104.87 0.09 0.86 0.27 2.87 sdb1 0.00 515.90 17.81 88.60 106.42 1286.49 9897.66 643.24 4948.83 5592.07 105.10 0.09 0.86 0.27 2.87 sda 0.02 10.39 0.15 0.66 0.81 29.14 87.50 14.57 43.75 58.32 144.72 0.04 54.28 1.65 0.13 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 24.85 0.00 7.92 6.89 0.00 sda2 0.02 10.39 0.15 0.53 0.68 29.14 87.50 14.57 43.75 58.32 172.04 0.04 64.52 1.96 0.13 dm-0 0.00 0.00 0.03 8.02 8.05 1.09 64.15 0.54 32.07 32.62 8.11 0.68 84.82 0.14 0.11 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.73 0.00 7.21 2.22 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.01 0.00 0.01 7.99 0.00 1.84 0.29 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.01 0.00 0.01 7.99 0.00 1.77 0.23 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.01 0.00 0.01 7.99 0.00 1.65 0.26 0.00 dm-5 0.00 0.00 0.12 2.92 3.04 27.97 23.35 13.98 11.68 25.66 16.88 1.04 341.67 0.06 0.02 dm-6 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.01 7.99 0.00 2.17 0.25 0.00 Device: rrqm/s wrqm/s r/s w/s op/s rsec/s wsec/s rkB/s wkB/s kB/s avgrq-sz avgqu-sz await svctm %busy sdb 0.00 1.79 0.00 5.78 5.78 0.00 70.12 0.00 35.06 35.06 12.14 0.02 2.72 2.62 1.51 sdb1 0.00 1.79 0.00 5.78 5.78 0.00 70.12 0.00 35.06 35.06 12.14 0.02 2.72 2.62 1.51 sda 0.00 0.20 0.00 1.39 1.39 0.00 12.75 0.00 6.37 6.37 9.14 0.00 1.29 0.43 0.06 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sda2 0.00 0.20 0.00 1.39 1.39 0.00 12.75 0.00 6.37 6.37 9.14 0.00 1.29 0.43 0.06 dm-0 0.00 0.00 0.00 1.59 1.59 0.00 12.75 0.00 6.37 6.37 8.00 0.00 1.25 0.38 0.06 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ^C #
Wrapper: ioSTAT.sh - to ignore the CPU-related part, the devices and partition list may vary from system to system.
psSTAT for Linux |
I was tired by strange/wrong 'top' output which in many cases just not showing or ignoring low loaded processes, and finally give you a wrong vision about your system. So I adapted my Solaris psSTAT idea to the Linux /proc structures...
So well, there are few similar options:
psSTAT (dim) v.2.0 Nov.2006 Usage: psSTAT [options] -l Long output -O active Only processes/users -T sec Timeout sec seconds between outputs -N name[,name2[,...]] only proc Name containing name, or name2, or ... -M mode Use Special Mode output: proc - output is grouped by process name user - output is grouped user name ref - reference: process name combined with pid dim$
Output example :
dim$ /etc/STATsrv/bin/psSTAT -O -T 1 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3153 dbus-daemon 0.02 0.00 2.0 0 0 17 0 1 2324 3166 hald 0.01 0.00 1.0 0 0 16 0 1 6916 3761 Xorg 0.01 0.00 1.0 0 0 5 -10 1 100680 3879 konsole 0.02 0.00 2.0 2 0 16 0 1 29416 24904 kpowersave 0.01 0.00 1.0 0 0 16 0 1 32720 28035 psSTAT 0.02 0.00 2.0 336 0 16 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 28035 psSTAT 0.03 0.00 3.0 336 0 17 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3761 Xorg 0.03 0.00 3.0 0 0 5 -10 1 100680 22726 java_vm 0.01 0.00 1.0 0 0 16 0 21 231760 28035 psSTAT 0.03 0.00 3.0 336 0 17 0 1 1812 PID PNAME UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE 1 init 0.00 0.00 0.0 0 0 16 0 1 1568 3761 Xorg 0.02 0.00 2.0 0 0 5 -10 1 100680 3879 konsole 0.01 0.00 1.0 0 0 15 0 1 29416 28035 psSTAT 0.03 0.00 3.0 336 0 16 0 1 1812 ^C dim$
There are 3 Linux add-ons based on psSTAT:
NOTE: data are collected in live from '/proc' data but by given time interval, so be aware - if during this interval some processes are forked and dead very quickly - they're simply not seen by tool as there will be no trace about them in any '/proc' data...
LpsSTAT (psSTAT) |
Source: psSTAT for Linux, mode: ref
Output example :
dim$ /etc/STATsrv/bin/psSTAT.sh 1 PNAME-PID UsrTM SysTM CPU% MinF MajF PRI NI Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE init-00001 0.00 0.00 0.0 0 0 16 0 1 1568 0 84 160 88 28 1256 12 dbus-daemon-03153 0.03 0.00 3.0 0 0 17 0 1 2324 0 820 308 84 328 1540 12 hald-03166 0.01 0.00 1.0 0 0 16 0 1 6916 0 2016 3312 580 204 2732 12 Xorg-03761 0.02 0.00 2.0 0 0 5 -10 1 100680 0 29688 88740 276 1472 6200 248 konsole-03879 0.01 0.00 1.0 0 0 16 0 1 29416 0 6684 2980 88 40 24820 44 opera-13455 0.01 0.00 1.0 0 0 15 0 1 84380 0 52596 49804 84 9788 21844 92 java_vm-22726 0.01 0.00 1.0 0 0 16 0 21 231760 0 23960 182852 116 12 48192 108 psSTAT-27995 0.01 0.00 1.0 336 0 16 0 1 1816 0 836 420 88 16 1256 12 ^C $dim
This STAT should be used if you're looking for a single process activity and go in detail for PID, etc.
LPrcLOAD (ProcLOAD) |
Source: psSTAT for Linux, mode: proc
Output example :
dim$ /etc/STATsrv/bin/ProcLOAD.sh 1 PNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE NetworkManager 0.00 0.00 0.0 0 0 1 0 1 3928 0 1048 324 88 264 3140 16 Xorg 0.01 0.00 1.0 0 0 1 1 1 100680 0 29688 88740 276 1472 6200 248 konsole 0.01 0.00 1.0 0 0 5 1 5 148032 0 30780 15852 440 200 124100 220 psSTAT 0.03 0.00 3.0 338 0 1 1 1 1816 0 836 420 88 16 1256 12 PNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE NetworkManager 0.00 0.00 0.0 0 0 1 0 1 3928 0 1048 324 88 264 3140 16 Xorg 0.01 0.00 1.0 0 0 1 1 1 100680 0 29688 88740 276 1472 6200 248 konsole 0.01 0.00 1.0 0 0 5 1 5 148032 0 30780 15852 440 200 124100 220 psSTAT 0.01 0.00 1.0 338 0 1 1 1 1816 0 836 420 88 16 1256 12 ^C $dim
This STAT should be used if you're looking for global per 'process name' activity and don't really need to go in detail - specially when you have a lot of processes running (!)
LUsrLOAD (UserLOAD) |
Source: psSTAT for Linux, mode: user
Output example :
dim$ /etc/STATsrv/bin/UserLOAD.sh 1 UNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE root 0.01 0.00 1.0 420 0 62 1 62 256312 3576 44224 33216 3208 5700 201456 616 dim 0.03 0.00 3.0 46 0 92 2 124 1774180 0 393556 795244 8176 60672 838516 2632 UNAME UsrTM SysTM CPU% MinF MajF Nmb Act Thr VmSIZE VmLCK VmRSS VmData VmSTK VmEXE VmLIB VmPTE root 0.02 0.00 2.0 338 0 62 1 62 256312 3576 44224 33216 3208 5700 201456 616 dim 0.02 0.00 2.0 46 0 92 2 124 1774180 0 393556 795244 8176 60672 838516 2632 ^C $dim
This STAT should be used if you're looking for global per 'user' activity and don't really need to go in detail - specially when your tasks are grouped per user or you have a lot of users using the system (!)
LnetLOAD (netLOAD) |
Source: my netLOAD script for Linux
Output example :
/etc/STATsrv/bin/netLOAD.sh 1 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 66070356 66070356 130181 130181 0 0 0 0 132140712 260362 eth0 32074500 19059001 236433 218784 0 0 0 0 51133501 455217 eth1 3766140 1544506 93950 56325 60 0 60 0 5310646 150275 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 0 0 0 0 0 0 0 0 0 0 eth0 0 0 0 0 0 0 0 0 0 0 eth1 0 0 2 3 0 0 0 0 0 5 Name IBytes/s OBytes/s IPack/s OPack/s IErr OErr IDrp ODrp Bytes/s Pack/s none 0 0 0 0 0 0 0 0 0 0 lo 0 0 0 0 0 0 0 0 0 0 eth0 0 0 0 0 0 0 0 0 0 0 eth1 0 0 2 3 0 0 0 0 0 5 ^C
For the STAT-service Wrapper, no need, sit hould work as on any Linux system.
Report Tool |
This User's Guide is completely written using Report Tool!! And as so often, this tool was mainly created to cover my own day to day needs.
Quite often I have to write reports to show performance findings, to present the observed system / application activity, etc., etc. Yes, etc. because sometime we have to write too much to make things work or simply to protect people from doing stupid things. :))
OK, you've started to write your document for a French customer, so you write it in French, and then it appears that the majority of the development team only speaks English. You start to keep two copies in parallel for the same document: FR/EN. Then you discover something very important but you can not say it yet your customer, but you absolutely need to communicate it internally. So you split the document once again: FR/EN and Customer/Internal, which means four different documents. The next split will give you eight version of the document. But it is still based on the same source of information. The result is a lot of hours spent doing copy-paste of activity graphs from the browser, teamquest, best1, patrol, etc. into your wordprocessor. It makes me cry... :))
I was really tired of this situation and tried to imagine something different.
- Document = N x Chapters
- Chapter = M x Sections
- Section = P x Paragraphs
- and so on ...
- Smallest part = Smallest part :-)
- the position of each Note in a Report is decided by its parent-ID (level + 1) and order number (same level)
- Note : each Note has/contains:
- a Data Type
- a Title
- text comments
- possibly an attachment (depends on Data Type)
- a list of attributes - Attributes : any Note may have zero, one or several attributes on:
- Language (French, English, ...)
- Confidentiality (Personal, Customer, ...)
- ... (any other can be easily added into the system) - Data Type : the list of Data Types is fixed (but may be extended):
- Text
- HTML
- Image
- Binary
- dim_STAT collect
- SysINFO
- HTML.tar.Z archive - ID: unique digital number
- Title: the main title
- Owner: owner information
- Chart: any additional comments to be present on the cover page
- Use: choose a pre-configured Report template
- Hide/Show Note comments
- Preview your report
- Generate the report
- go Home (back to the main Report page)
- click on the 'down' icon to create a new note 'after' the current one (same parent level)
- click on the 'right' icon to create new 'child' note 'under' the current one (parent level+1)
- click on the 'cut' icon to cut and then paste (may go to 'trash' if need to be deleted (end of screen))
- click on the 'data' to edit/view the Note
- an empty line is seen as a 'new paragraph'
- three spaces at the start of the line are replaced by a "blanked-tabulation"
- some kind of limited wiki-like syntax is supported (see below example of input text containing wiki-like tags and its output result)..
- one
- two
- three
- the host name
- the host's STAT-service port
- 1. setup dim_STAT server database parameters, [Next]
- 2. select STAT collect you want to use, [Next]
- 3. select STATs you want to see and time interval, [Next]
- 4. [Finish] or select STATs you want to see and time interval, [Next] (goto 4)
- 5. graph titles, choose graph parameters, [Save]
- Server : localhost
- Port : Default
- Database : Default
- replay the same time slices for N days (in Date and Time)
- auto include time/date into generated graph titles
- replace on-the-fly some part (max 5) of the LOG messages
- All the Per-Host STATs are Bookmarks. The more you created Bookmarks during analyzing, the more data you can generate in for report.
- When I selected the two hosts, the tool gave me also Multi-Host STATs, depending on stat commands to be present or not. Each STAT (like in Multi-Host Analyze) will put all requested hosts onto single graph.
- per host : CPU busy%, Run queue, Mutex spin, System calls/s
- multi-host : CPU busy%, Network load bytes/s and packets/s
- Main title
- per graph title
- order generation
- graph mode, style, size, etc.
- Auto-AVG: good to select if you have too large time intervals and your graph become too dense
- Show LOG/TASK (as during analyze)
- Show processing - get generation output on the browser. Not all browsers work correctly with this feature, some are waiting for an EOF before they show something. If you don't choose this option, processing output is always printed into a /tmp/.report.log file on the Report Tool server side.
Overview |
The first issue was the choice of format: At least everybody on any platform is able to read HTML. So that's an easy one. If needed you can easily convert HTML into other formats, like PDF, etc.
The next problem is harder to solve. It was my idea to find a solution for generating different kinds of documents from the same main data source. When you take a look at any document, how is its content organized? You'll see:
It all depends on what is your smallest part. So, I've named my smallest part a Note and a Document or Report is presented simply as an ordered tree of Notes.
The main points :
Any Note can be created/edited/deleted at any time. During Report generation you only need to choose the right criteria for your requirements to create a valid document with all parts corresponding to the criteria.
Datatype: Text, HTML, Image, Binary |
These data types are quite similar, you can create any note with any text, html, image or binary file in an attachment, with or without your comments. Except binary, any other file may be presented "In-Line" or "Linked".
In-Line means your file will be part of the main document page and the visible contents, ex: text directly included, image showed, etc.
Linked means linked :)), meaning that the main document page will only include a link to your attachment. However, this attachment will be always included with document.
Note: the same idea is applied to other types of Notes as well.
Datatype: SysINFO |
This is a special type, with the purpose to get on-line system information from any host on the network that runs STAT-service. Of course only if you have permission to access this service and SysINFO.
Datatype: HTML.tar.Z |
A special type in case you want to integrate into your Report any other documents, already written, that are converted to HTML and archived into a single tar.Z file. As you may have several files in your archive, the tool will ask you for the name of the 'main' file, which will maintain references to all other files.
Datatype: dim_STAT-Snapshot |
A type for when you've saved graph pages based on Java applets during dim_STAT analyzis. You may integrate them 'as is', the tool will extract the applet data and insert them as Note contents.
Probably this should be deprecated, as any graph can be saved in PNG format, or you could simply convert it to PNG or GIF.
Datatype: dim_STAT-Collect |
This is a very special type, it helps you to generate all STAT graphs automatically and it will save you a lot of time. Follow the example below.
Preview / Generate / Publish |
At any moment you can 'Preview' your Report or 'Generate' a current/final version to be accessed on-line, or saved and shared as tar.Z archive, or as a single PDF file. Also, your document may be published on another site (actually, this part is limited to the same physical host).
Export / Import |
These features explains why Report Tool is called 'Mobile'. At any time you can export your Report and import it into any other dim_STAT server. This means: you edit/prepare everything on your laptop, and from time to time you synchronize your work with a central repository. Also, it gives you a simple way to prepare your own templates! Instead of starting a new report every time, just import your template (old report) and continue.
Let's try! New Report |
Now relax, take your coffee, be sure you've 20 minutes of free time (while nobody is stressing you), your GSM is off, you're ready to listen ... go to the dim_STAT Main page and click on 'Report Tool'.
Click on Report Tool |
As you could have expected, nothing yet here for the moment.
Let's click on the "New Report" button.
New Report |
All you need to do here is just to fill in the new report form:
and click on "Create"
Edit Report |
Wow! It works! :))
With the 'big' buttons, you may now:
But if you'll hover your mouse over the pre-generated notes you'll see pop-ups explaining each action.
Edit Actions |
And now:
Let's edit 'General Information' (click on 'data' icon).
Edit Note |
From here you may see the current Note preview and edit the Note comments or attributes. If you change only attributes, then click on the corresponding button to apply the changes. If you want to modify the Note comments, click on 'Edit Note'. BTW, you can also do that with any external editor.
Edit Note, continue... |
Add what you want in the text fields (you may use any HTML tags, etc.)
Edit Note, continue2... |
Note: if you choose Text-format option your text is auto-formatted.
Save the Note.
Wiki-Like syntax: INPUT |
Here is a =!Big BOLD Header!= Here is just a text +!with INCREASED font size!+ *!Here!* or **Here** will be a bold text /!Here!/ is text in italic _!Here!_ or __Here__ will by underlined test __Simple TEXT List__ : - one - two - three __Simple HTML List__ : * one * two * three __Simple code or text formatted__ : [code] $ ls -l /usr/sfw/bin/* ... $ ps -ef ... $ pkill -9 oracle [/code] __Simple Table__ : | **System/Performance** | **TPS** | **Resp.Time(ms)** | | M5000 | 4.500 | 10.0 | | M8000 | 8.000 | 10.0 | | M9000 | 15.000 | 9.2 |
Wiki-Like syntax: OUTPUT |
Here is a
Big BOLD Header
Here is just a text with INCREASED font size
Here or Here will be a bold text
Here is text in italic
Here or Here will by underlined test
Simple TEXT List :
- one
- two
- three
Simple HTML List :
Simple code or text formatted :
$ ls -l /usr/sfw/bin/* ... $ ps -ef ... $ pkill -9 oracle
Simple Table :
System/Performance TPS Resp.Time(ms) M5000 4.500 10.0 M8000 8.000 10.0 M9000 15.000 9.2
Edit Note, continue3... |
You may re-edit again or open the door :))
Edit Report, continue... |
Let's fill other notes in the same way...
Edit Report, continue2... |
So far so good :))
Now, I want to add a SysINFO Note for both hosts 'tahiti' and 'java'. SysINFO data is collected on-line, at the moment you're asking for and it's an easy way to keep your document updated at the moment you're writing. BTW, look into the STAT-service package to know how it is configured on the host side. You may extend it with any other information you need.
So, a new SysINFO note under 'Software Configuration'... (right icon)
Add Note |
New Note -- SysINFO |
As the tool has no idea what kind of Note you want to add, it will ask you to choose one before it can continue. Also, I did not want to add too much complexity to the interface.
So, just click on 'SysINFO' here...
New Note -- SysINFO Form |
Here you will need to fill in the SysINFO form: the usual data (title/comments/attributes) and SysINFO specific ones:
As SysINFO output is usually quite wide, it's preferred to keep it as an 'External Link'.
Save the Note. If you gave the right hostname, port and the STAT-service is up and running on this host, You'll receive your data in a few seconds, in our example from the 'tahiti' domain :))
New Note -- SysINFO Result |
Because I asked for 'Linked' contents, there is only a link to SysINFO data from 'tahiti'. Let's click on it to see if it works correctly.
New Note -- SysINFO Link Contents |
Edit Report, continue3... |
As you see, I've my new SysINFO note under 'Software Configuration'. Let's get SysINFO from 'java' host now and place it 'under' current tahiti SysINFO...
Edit Report, continue4... |
Now, under 'Hardware Configuration' I want to add an image representing my platform diagram (a very simple image, just for those who are not able to imagine two hosts with one storage device :)), but "a picture says more than a thousand words". :))
So: 'Hardware Configuration' -> Image -> ...
Add New Note -- Image |
Once again, similar info to fill, except you may give a name of your image file to upload [Browse]. Let's fill it and save as 'In-Line' attachment.
Add New Note -- Image Inline |
Oops, it's TOO BIG! And that's not so you can see it better!! I prefer to keep all big images 'linked'.
So, [Edit Note] -> 'As External Link' (no more need to give image file again) -> [Save Note]
Add New Note -- Image Linked |
That's better!!
Now, let's add a 'dim_STAT Collect' note!
Leave this page [Door], go to the end of Report and click on the [Right] icon on 'Report' note, then choose 'dim_STAT Collect'.
Add New Note -- dim_STAT Collect, Step1 |
The dim_STAT Collect Note needs several steps to be created:
We are on the Step-1 here, and if you don't have any data collected, you may get them from the 'Default' demo collect:
[Next]...
NOTE: the interface becomes more optimized and more extended with each new release, so screen shots are probably not everywhere up to date.
Add New Note -- dim_STAT Collect, Step2 |
Choose STAT collect here and Search mode. We have already the log messages from the 'java' host, each message was added before any of the tests started, so it's quite easy to find them, corresponding to the time interval for each test. Otherwise we can always do a 'Date and Time' search, but you'll quickly understand thatt that is much more painful compared to LOG messages.
NOTE: with version 8.0, more options added to simplify reporting:
Add New Note -- dim_STAT Collect, Step3 |
Now we need to choose the type of graphs we want to see and the time interval of them.
NOTE:
Add New Note -- dim_STAT Collect, Step3 continue |
Here we're choosing:
Time interval: as we know each test run for ~15min, we can choose a time interval of '15 min. After each LOG message.
[Next]...
Add New Note -- dim_STAT Collect, Step4 |
So, this looks OK. I've got my STATs selected with a pre-populated graph title (from the LOG message). BTW, you may see that all your previously selected STATs are pre-selected here (the selection is saved via cookies and specific to each database name).
[Finish] ...
Add New Note -- dim_STAT Collect, Step5 |
Here you have to specify the graphs parameters:
[Save]...
Now you're free to start doing something else, because your machine is working for you and all you have to do is sit back and relax. Once you get use to the report tool, you'll ask it to generate A LOT OF graphs at the same time and you've time on your hands to do something else.
Add New Note -- dim_STAT Collect Result |
Here is the final result after all the graphs are generated!
Click on a link to see the graph results.
NOTE: If you remember, I've selected generating order by Collect , and what I see now is a list of collects first, and each collect link will show me all selected STAT graphs for the same given STAT collect.
Now, if I select the by STATs order generation - I'll see here a STAT list, and each link will show me the same STAT metric for different collects on single page...
Add New Note -- dim_STAT Collect Contents, ordered by:Collect |
Add New Note -- dim_STAT Collect Result per STATs |
As you see here, the single STAT link contains all given collects, so if you want to compare the network usage in different cases, just click on either the bytes/sec or the packets/sec link.
Add New Note -- dim_STAT Collect Contents, ordered by:STATS |
Edit Report, next... |
Edit Report -- Cut |
Last thing now: I don't want to see my 'per STAT' first in Report section, just let's move it at the end...
Click on [Cut] icon, then [Paste] where you want ([Trash] icon does delete operation!)
Edit Report -- Paste! |
Edit Report -- Pasted... |
Edit Report -- Preview |
Edit Report -- Preview Output |
Edit Report -- Preview Output2 |
Generate Report |
Generated Report documents |
Report Tool Home |
THAT'S ALL, folks! :))
The export file of this demonstration report may be found within dim_STAT distribution as 'ExpReport_15.tar.Z'. You may import and play with it as long as you want! :))
Also, for good first exercise you may try to generate your first graphs from 'Demo collect' giving by default in your dim_STAT database!...
Additional Tools |
Since version 5, additional tools are shipping with the package, but it seems I forgot to mention them explicitly and a lot of users didn't know about it.
- JRE or JDK installed on the system
- X11 DISPLAY positioned for image output
- Analyzing dim_STAT Java applet graphs time to time you "Save As" your pages into /Report/J
- Once finished, make a backup first of your files
- Execute:
$ /apps/Java2GIF/j2gif.sh /Report/J
- That's all :-)
- doesn't need the X11 server for output
- processing execution is much faster compared to Java2GIF
- uses PNG image format
- doesn't support histogram mode
Java2GIF Tool |
This tool converts HTML pages containing dim_STAT graphs as Java applets to HTML pages with GIF images. This is very useful for reporting, printing, etc. (of course you don't need it if you used PNG :-))
Installed in : /apps/Java2GIF
Requirements :
Configuration : edit the "j2gif.sh" script to point to the right PATH for your "java" binary
Usage :
$ j2gif.sh /full/path/to/dir/with/your/html/files
Example :
Java2PNG Tool |
Similar to Java2GIF, but with few differences:
Installed in : /apps/ADMIN
Requirement : -
Configuration : -
Usage :
$ cd /apps/ADMIN $ Java2PNG /full/path/to/dir/with/your/html/files
HTMLDOC Tool |
Installed in : /apps/htmldoc
Usage : (RTFM first! :-))
$ cat /apps/htmldoc/README $ /apps/htmldoc/bin/htmldoc --webpage --header t.D -f Report.pdf *.html Report/*.html
README
This is a short README about "htmldoc" program. This program is free and I've found it very useful for making printable and well presented HTML ==> PDF documents. Of course, HTML is great for screen viewing, but when you should bring a printed version - it's not so simple to obtain something presentable in easy way... Also, I like to send PDF documents, they are small and very portable :)) The home page of "htmldoc" tool is: http://www.easysw.com/htmldoc You may download and compile the last version from this site. But as people are lazy by defenition :)) , I've pre-installed not last, but well working binary of this great tool... For detailed description you may start to read the htmldoc manual, but if you are lazy as me :)), you may just start: /apps/htmldoc/bin/htmldoc --webpage --header t.D -f Report.pdf *.html Report/*.html to get PDF document (Report.pdf) from collection of HTML files... That's all! :)) -Dimitri
FAQ |
- you've installed the STAT-service package on this host and started it.
- be sure your server is seen with "Green LED" by dim_STAT Server
- * - any character or none
- ? - any single character
- [amp] - single character and one from 'a', 'p', or 'm'
- [a-z] - any single character between 'a' and 'z' (both included)
- [^a-z] - any single character NOT between 'a' and 'z' (both included)
- !Pattern - apply NOT condition on the whole pattern
- Pattern || Pattern - apply OR condition between two patterns (or more)
- Pattern && Pattern - apply AND condition between two patterns (or more), has higher priority vs OR
- *Test??* - match all messages having TestNN in title
- *Test??* && *End* - match all TestNN messages containing End
- *Test??* && *End* || *Begin* - match all TestNN messages containing End or Begin
- !*Test??* && *End* || *Begin* - match any messages except TestNN and containing End or Begin
Sizing of dim_STAT Instance... |
This problem is simple: there are no sizing rules. :))
Disk space: it depends only on the size of the information collected. On the Preferences page you can see the space used by the current database and the size of your biggest file. You cannot reduce the file sizes by data recycle, however it's possible now with a Convert Engine operation (as the table will be fully recreated) - keep in mind anyway that InnoDB is using much more disk space than MyISAM.
CPU: for a collect your CPU is hardly used at all. However, once you start a query via the Web interface you will access a big amount of data! Your query may us all of CPU. Normally query execution time is relatively short, but depends directly on amount of data demanded.
Separated databases are fine when you need different administrative tasks regarding the data collected. For example, it may be annoying when somebody is loading a large amount of data at the same time you're trying to analyze something. This will create additional locks and slow down the performance for others. MySQL (in the version used by dim_STAT) uses "table locking", so there can be only a single writer at the same time, and write operations are exclusive (no reads at the same time). If you use your own database you have less reasons to blame others.
A desktop running dim_STAT server could be very heavily used, or not used at all. It all depends only on what you're doing with it.
I've started my collects but it seems that nothing gets collected? |
First of all be sure that:
If everything seems to be correct in that sense, check the output of your '/etc/STATsrv/log/access.log' file.
Syntax of text matching pattern |
Quite often in the dim_STAT interface you may see an input text field that filters values or attributes matching a specified pattern. By default they are filled with '*' (means all), but what kind of syntax does it accept?Pattern by example:
Examples matching LOG messages:
When will you upgrade to the newer MySQL version? |
But why?... :-))
Should we change a good old working horse just because it's old?? It worked fine for over 10 years now, and does exactly what it needs to do. And MyISAM is not working better in MySQL4 or MySQL5.
MyISAM is really great for its binary compatibility between all platforms - it's simplifying so many things! :-)
In some cases it make sense to move some critical tables from MyISAM to InnoDB engine and get advantage of a data protection against crashes...
As well should be interesting to ship dim_STAT in parallel with a version of PostgreSQL!! But that's another story...
UPDATE : since version 9.0 - dim_STAT is based on MySQL 5.5 (GA) and include both MyISAM and InnoDB engines, and you're free at any time to convert your database to the best situated engine for your activity! :-)
With multiple hosts to monitor, is it possible to graph them together?.. |
It's exactly what do you have with a Multi-Host Analyze feature. As well when you have hundreds of hosts you may even group stats by N first/last letters in the hostname, etc.. Data are here, and you just play with them.. :-)
How easy is it to integrate any new stats to monitor, including DTrace stuff? |
Usually it's quite straight forward to add new stat commands into dim_STAT. But at any time feel free to ask for help from thedim_STAT Users Group - as well there are already several debug hints were discussed:
Regarding DTrace, once you have a working script with regular and well formatted output - usually it takes 5 minute to integrate it as a new dim_STAT Add-On. Solaris STAT-service already contains some DTrace scripts (for example, see: IOpatt Add-On)...
Could I get the raw data via dim_STAT-CLI instead of the graphs?... |
Yes, of course!
See "-Data" option within dim_STAT-CLI.
No comments:
Post a Comment