Jul 312012
 

Trying to setup Nagios to play with monitoring facilities, turned out there are way too many things are NOT running out of the box. I’m trying to write as much as I can remember, so that I don’t have to Google again next time I step into the setup task again. Sure, others may be befinited from this as well.

A brief intro about the environment – I have my monitoring node in EC2 in east coast, another 3 servers to be monitored in EC2 west cost, all four are running Ubuntu 12.04, plus another physical box sitting in a IDC in Beijing, China, running Fedora 14 (the owner does not want to upgrade for some reason). Almost all servers are running classic applications for Web, such as Nginx, mysql, etc. Other than those public services I also need to monitor system status like disk space, memory utilization, ssh liveness, etc.

The installation was pretty straightforward, for anything mentioned here you can do apt-cache/yum search to find out the exact package to be installed. Just to mention that Fedora tends to separate plugins into LOTS of individual packages, while Ubuntu just group them up to several jumbo packages. Good or bad, it’s all up to you.

Something new to me (last time I touched Nagios was 6 years ago) is that nrpe, with its help I can avoid setting up too many ACL holes to make monitoring works. I do encourage you take a look into this unless you have all servers stay in a same colocation, plus a firewall in front of all these boxes facing outside world.

Here are several things I spent a little bit more time than other features:

  • dont_blame_nrpe=1 for nrpe, so nrpe can take parameter from monitoring node
  • some commands defined for nrpe:
    command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$
    command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
    command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$
    command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -u $ARG1$ -p $ARG2$
    command[check_ntp]=/usr/lib/nagios/plugins/check_ntp -H localhost
    command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$
    command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$
  • To get graphs, install nagiosgrapher on monitoring node, and change following settings in nagios.cfg
    process_performance_data=1
    service_perfdata_command=ngraph-process-service-perfdata-pipe
  • Don’t forget change contact definition, so far I’m using email address, will dig in to see if there is any good and free services to do alerting stuffs
  • Here is a tricky part to use white spaces in check_command definition:
    # check mysql service
    define service {
    hostgroup_name              mysql-servers
    service_description         MySQL over NRPE
    check_command               check_nrpe!check_mysql!username\ pasword
    use                         generic-service
    notification_interval       0 ; set > 0 if you want to be renotified
    }

    Note that the white space needs to be escaped by backslash (\). It took me quite sometime to figure this out …

  • Sometime you may want to issue the check manually (through the Web), or delete service comments. You have to enable external command in nagios.cfg:
    check_external_commands=1
    Also, due to Ubuntu’s packaging issue, you need to issue two more commands to get permission problem fixed (with sudo, for sure):
    dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
    dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3

  5 Responses to “Notes on Nagios setup”

  1. Here comes more about nagiosgrapher setup.

    After installation on Ubuntu, it seems several commonly used services were not added to default graph generation, include:

    • load
    • user
    • mysql

    For load and user, it’s pretty simple – copy corresponding check_*.ncfg files from /usr/share/nagiosgrapher/debian/cfg/ngraph.d/standard to /etc/nagiosgrapher/ngraph.d/standard, restart nagiosgrapher service, wait for all machines had done at least one check for both services (so that server’s configuration under /etc/nagios3/conf.d/ngraph/serviceext got updated), then restart nagios3 service to make them show up in the web interface.

    mysql is a tricky thing, it seems check_mysql.ncfg has some problems so you need to tweak it (after copy to /etc/nagiosgrapher/ngraph.d/standard), include:

    • remove the first two sections as there is no such a thing in the stats talking about “diff”
    • rename all ‘mysql-info’ to mysql
    • change Flushtables: to Flush tables: so it matches what exactly in the mysql status
    • change Opentables: to Open tables:, same reason as above
    • (I guess this is optional) change queries: to Slow queries:, and avg: to Queries per second avg:, I believe this is optional because the existing regular expression is runnable, though my changes make it easy to read

    Then just do what I mentioned above for check_load and check_users, it should be there.

    Again there are something new in recent nagios – it can extract data from log (or status message) with graph_log_regex directive, while in the old days one can only read data from performance data parts. Sure you can still read data from value part with graph_perf_regex.

  2. Put some keywords here so that people can get help if they are searching online:

    No matching perfdata values found
    check_mysql
    check_mysql.ncfg
    nagios
    nagiosgrapher

  3. And it seems service_name in nagios grapher’s ncfg file have to match (partially is ok) service_name in nagios3’s service configuration, otherwise it will be just simply ignored by grapher.

  4. Keep posting – I just enabled nagios check for php5-fpm service.

    First you need to enable status page for php5-fpm by uncommenting this in configuration:
    pm.status_path = /status

    Then grab fastcgi client library for php, then just several lines of code:

    $host = 'localhost';
    $port = 9000;
    $options = getopt('H:p:');
    if (isset($options['H']) && !empty($options['H'])) {
        $host = $options['H'];
    }
    if (isset($options['p']) && !empty($options['p'])) {
        $port = $options['p'];
    }
    
    $client = new FCGIClient($host, $port);
    $response = $client->request(
                    array(
                        'REQUEST_METHOD' => 'GET',
                        'SCRIPT_FILENAME' => '/status',
                        'SCRIPT_NAME' => '/status',
                        'QUERY_STRING' => ''
                        ),
                        ''
                    );
    echo str_replace(array("\r", "\n"), " ", $response);

    and things are done. Sure you need to go through all configuration changes for nagios, nrpe, and grapher if you want.

  5. It seems check_dns.ncfg has a regex problem – \d should be [\.\d] otherwise it will take 0.011 second as 11 seconds.

Sorry, the comment form is closed at this time.