Trying to setup Nagios to play with monitoring facilities, turned out there are way too many things are NOT running out of the box. I’m trying to write as much as I can remember, so that I don’t have to Google again next time I step into the setup task again. Sure, others may be befinited from this as well.
A brief intro about the environment – I have my monitoring node in EC2 in east coast, another 3 servers to be monitored in EC2 west cost, all four are running Ubuntu 12.04, plus another physical box sitting in a IDC in Beijing, China, running Fedora 14 (the owner does not want to upgrade for some reason). Almost all servers are running classic applications for Web, such as Nginx, mysql, etc. Other than those public services I also need to monitor system status like disk space, memory utilization, ssh liveness, etc.
The installation was pretty straightforward, for anything mentioned here you can do apt-cache/yum search to find out the exact package to be installed. Just to mention that Fedora tends to separate plugins into LOTS of individual packages, while Ubuntu just group them up to several jumbo packages. Good or bad, it’s all up to you.
Something new to me (last time I touched Nagios was 6 years ago) is that nrpe, with its help I can avoid setting up too many ACL holes to make monitoring works. I do encourage you take a look into this unless you have all servers stay in a same colocation, plus a firewall in front of all these boxes facing outside world.
Here are several things I spent a little bit more time than other features:
- dont_blame_nrpe=1 for nrpe, so nrpe can take parameter from monitoring node
- some commands defined for nrpe:
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$ command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -u $ARG1$ -p $ARG2$ command[check_ntp]=/usr/lib/nagios/plugins/check_ntp -H localhost command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$
- To get graphs, install nagiosgrapher on monitoring node, and change following settings in nagios.cfg
process_performance_data=1 service_perfdata_command=ngraph-process-service-perfdata-pipe
- Don’t forget change contact definition, so far I’m using email address, will dig in to see if there is any good and free services to do alerting stuffs
- Here is a tricky part to use white spaces in check_command definition:
# check mysql service define service { hostgroup_name mysql-servers service_description MySQL over NRPE check_command check_nrpe!check_mysql!username\ pasword use generic-service notification_interval 0 ; set > 0 if you want to be renotified }
Note that the white space needs to be escaped by backslash (\). It took me quite sometime to figure this out …
- Sometime you may want to issue the check manually (through the Web), or delete service comments. You have to enable external command in nagios.cfg:
check_external_commands=1
Also, due to Ubuntu’s packaging issue, you need to issue two more commands to get permission problem fixed (with sudo, for sure):dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
5 Responses to “Notes on Nagios setup”
Sorry, the comment form is closed at this time.
Here comes more about nagiosgrapher setup.
After installation on Ubuntu, it seems several commonly used services were not added to default graph generation, include:
For load and user, it’s pretty simple – copy corresponding check_*.ncfg files from /usr/share/nagiosgrapher/debian/cfg/ngraph.d/standard to /etc/nagiosgrapher/ngraph.d/standard, restart nagiosgrapher service, wait for all machines had done at least one check for both services (so that server’s configuration under /etc/nagios3/conf.d/ngraph/serviceext got updated), then restart nagios3 service to make them show up in the web interface.
mysql is a tricky thing, it seems check_mysql.ncfg has some problems so you need to tweak it (after copy to /etc/nagiosgrapher/ngraph.d/standard), include:
Then just do what I mentioned above for check_load and check_users, it should be there.
Again there are something new in recent nagios – it can extract data from log (or status message) with graph_log_regex directive, while in the old days one can only read data from performance data parts. Sure you can still read data from value part with graph_perf_regex.
Put some keywords here so that people can get help if they are searching online:
No matching perfdata values found
check_mysql
check_mysql.ncfg
nagios
nagiosgrapher
And it seems service_name in nagios grapher’s ncfg file have to match (partially is ok) service_name in nagios3’s service configuration, otherwise it will be just simply ignored by grapher.
Keep posting – I just enabled nagios check for php5-fpm service.
First you need to enable status page for php5-fpm by uncommenting this in configuration:
pm.status_path = /status
Then grab fastcgi client library for php, then just several lines of code:
and things are done. Sure you need to go through all configuration changes for nagios, nrpe, and grapher if you want.
It seems check_dns.ncfg has a regex problem – \d should be [\.\d] otherwise it will take 0.011 second as 11 seconds.