суббота, 21 февраля 2015 г.

Introducing collectd-iostat-python


Collectd-iostat-python is an iostat plugin for collectd that allows you to graph Linux iostat metrics in Graphite or other output formats that are supported by collectd.
This plugin (and mostly this README) is rewrite of kieran's collectd-iostat in Python and collectd-python instead of Ruby and collectd-exec

Why ?

Disk performance is quite crucial for most of modern server applications, especially databases. E.g. MySQL - check out this slides from Percona Live conference.
Although collectd provides disk statistics out of the box, graphing the metrics as shown by iostat was found to be more useful and graphic, because iostat reports usage of block devices, partitions, multipath devices and LVM volumes.
Also this plugin was rewritten in Python, because its a preferable language for siteops' tools on my current job, and choice of using collectd-python instead of collectd-exec was made for performance and stability reasons.

How ?

Collectd-iostat-python functions by calling iostat with some predefined intervals and push that data to collectd using collectd-python plugin.
Collectd can be then configured to write the collected data into many output formats that are supported by it's write plugins, such as graphite, which was the primary use case for this plugin.

Setup

Deploy the collectd python plugin into a suitable plugin directory for your collectd instance.
Configure collectd's python plugin to execute the iostat plugin using a stanza similar to the following:

Once functioning, the iostat data should then be visible via your various output plugins.
In the case of Graphite, collectd should be writing data to graphite in thehostname_domain_tld.collectd_iostat_python.DEVICE.column-name style namespaces. Symbols like '/','-' and '%' in metric names (but not in device names) automatically replacing by underscores (i.e. '_')
Please note that plugin will take only last line of iostat output, so big Count numbers also have no sense, but Count needs to be more than 1 to get actual and not historical data. And please make Interval * Count << Collectd.INTERVAL (20 seconds by default). I found e.g. Count=2 and Interval=2 works quite well for me.

Technical notes

For parsing iostat output I'm using jakamkon's python-iostat python module, but as internal part of script instead of separate module because of couple of fixes - using Kbytes instead of blocks, adding -N to iostat for LVM endpoint resolving, migration to subprocess module as replacement of deprecated popen3, objectification etc.

TODO

Maybe some data aggregation needed, e.g. we can use some max / avg aggregation of data across intervals instead of picking last line of iostat output.