How to Set up Nagios

Nagios monitoring system is distributed with a central server and a few client nodes. Two modes of monitoring are supported, namely active check and passive check. Active check requires the server to execute the checking mechanism and therefore can lead to high processing at the server when number of client nodes increases. On the other hand, passive checks are executed on the client nodes and the results are sent to the server side.

This post covers how to set up Nagios with passive checks. Each client is customized to monitor the tasks run on the client and send out the results to the server. The central server will receive the results, render the results through web interface and notify critical issues through emails.

1. Nagios Server Set Up

Ubuntu repository has Nagios packages. Though it’s not the latest, it is stable and easy to install. Simply issue the command below,

sudo apt-get install nagios3

This will likely to install postfix mail transfer agent (MTA) too. postfix is required to send out notification emails. Simply follow the installation screen to set up postfix.

1.1 Web Interface Configuration

Nagios server provides web interface with the help of apache web server. Normally everything is taken care of by the installation process and one can enter the url to the browser to view the web interface. The url is normally http://<ip or hostname>/nagios3/. Below is a screenshot of the Nagios display on my set up.

Figure 1. Nagios Web Interface

1.1.1 Apache Virtual Host

In case your machine has some sites set up already, and you want to configure Nagios web interface as a virtual host, this can be done easily.

The nagios installation comes with a sample apache configuration file. One can simply use the configuration with slight modifications.

a. Copy the configuration files

cp -rpf  /usr/share/doc/nagios3-common/examples/apache2.conf /etc/apache2/sites-enabled/nagios

ln -sf /etc/apache2/sites-available/nagios /etc/apache2/sites-enabled/nagios

b. Edit the file

Then we’ll need to add the following two lines.

To the beginning of the file,

<VirtualHost *:80>

To the end of the file,

</VirtualHost>

c. Restart apache2 and nagios3

sudo service apache2 restart

sudo service nagios3 restart

 

1.2 Configuration

Nagios3 configuration files are under the directory /etc/nagios3/.

nagios.cfg is the main configuration. You can read information about other configuration files and directories from the file.

By default, there’s also a commands.cfg file, which defines the commands and cgi.cfg file, which contains the configurations for the cgi.

More configuration files can be found under /etc/nagios3/conf.d directory.

1.2.1 Host and Host Group Configuration

In order to monitor a client nodes, we’ll need to define it as a host. We can create a new file under conf.d directory named host.cfg, and then adds the host definition. A sample host definition is as below,

define host {

   use generic-host;

   host_name testMachine;

   alias    some remote host;

   address 127.0.0.1;

}

This definition inherits the generic-host definition, which is available in the conf.d/generic-host_nagios2.cfg file.

We can define as many hosts as we want in the same file or using a different file.

When number of hosts increase, we want to group them into host groups for easier management and configuration. This can be done easily. By default, a number of host groups have been defined under hostgroup_nagios2.cfg file. We can add our host group by appending the following to the end of the file,

define hostgroup {

               hostgroup_name  test-servers

               alias           SSH servers

               members         testMachine

   }

When you have multiple hosts for a group, you can use “,” to separate them.

1.2.2 Service Configuration

The default services are defined at conf.d/services_nagios2.cfg file. Here we describe how to define a passive service.

First, create a file named passive-service_nagios2.cfg under conf.d directory. Copy the content below to the file,

define service {

 

       use     generic-service

 

       name    passive_service

 

       active_checks_enabled   0

 

       passive_checks_enabled  1

 

       flap_detection_enabled  0

 

       register                0

 

       is_volatile             0

 

       check_period            24x7

 

       max_check_attempts      1

 

       normal_check_interval   5

 

       retry_check_interval    1

 

       check_freshness         0

 

       contact_groups  admins

 

       check_command   check_dummy!0

 

       notification_interval   120

 

       notification_period     24x7

 

       notification_options    w,u,c,r

 

       stalking_options        w,c,u

 

}

This basically defines a generic passive service, where the active checks have been disabled and passive checks are enabled.

Now create a file, say test.cfg, put the content below to the file,

define service {

       use     passive_service

       service_description     TestMessage

       host_name       testMachine

}

Alternatively, you can use the host group we defined previously instead of hostname. This can be handy when you have lots of client nodes.

define service {

       use     passive_service

       service_description     TestMessage

       hostgroup_name  test-servers

}

1.2.3 Define the Passive Check Command

In the passive_service defined above, we called a command “check_dummy!0”, which has not been defined yet. The command is not defined yet.

Normally the external commands exist as plug-ins for Nagios. The executables can be found under /usr/lib/nagios/plugins/ directory. The configuration files for the plug-ins are found at /etc/nagios-plugins/config/ directory.

To continue our example, we create a configuration file passive_check.cfg under /etc/nagios-plugins/config/ with the following content,

define command {

       command_name    check_dummy

       command_line    $USER1$check_dummy $ARG1$

}

Note that the check_dummy executable already exists, so that’s all we need to do.

1.2.4 Enable Passive Checks

By default, the passive checks are disabled. We’ll need to open /etc/nagios3/nagios.cfg, the main configuration file, and set “check_external_commands=1”. You may also want to adjust the command_check_interval to control the frequency of checking updates from client nodes.

1.2.5 Install NSCA at Server

nsca is a plug in allows the clients to send passive check results to the server. One can download nsca from reference 1.

To compile the nsca plug in.

./configure

make all

If everything goes right, there’ll be executables nsca and send_nsca under src/ directory.

We can copy the nsca and sample-config/nsca.cfg to /etc/nagios3 directory and start the nsca server side by ./nsca -c /etc/nagios3/nsca.cfg.

Then nsca is ready to accept checking results from client nodes and forward to Nagios server.

1.2.6 Email Notification

The notification on Nagios can be configured with great flexibility. Here we simply shows how to enable the straightforward email notification.

At conf.d directory, open contacts_nagios2.cfg. We simply change the email address for contact_name root as our own email address, say test@gmail.com.

We can use the following script to test if the email server works fine.

echo “hello” | /usr/bin/mail -s “hello” test@gmail.com

2. Client Node Configuration

The client node configuration is simple. Create a folder, say /opt/nagios/ for all the client side files. Copy send_nsca and sample-config/send_nsca.cfg files to this folder. And now we’re ready to write checking scripts.

The scripts below check if certain processes are running and then send results to server.

 

#!/bin/bash

 

#!/bin/bash

 

HOST="TestMachine"

 

SERVICE="testMessage"        #the nagios nsca service name

 

NPATH="/opt/nagios/"

 

NSCA="send_nsca"

 

OUTFN="out.txt"

 

CONFIG="send_nsca.cfg"

 

NSCA_PATH="$NPATH$NSCA"

 

OUT_PATH="$NPATH$OUTFN"

 

CONFIG_PATH="$NPATH$CONFIG"

 

SUCCESS=0

 

WARNING=1

 

FAIL=2

 

SUCCESS_DESP="All required processes are running"

 

FAIL_DESP="At least one required process is not running"

 

PSs=('sshd');

FPSs=();

 

for pc in "${PSs[@]}"

 

do

 

   if ps ax | grep -v grep | grep $pc >; /dev/null

 

   then

 

       echo "$pc is running"

 

   else

 

       FPSs=("${FPSs[@]}" $pc)

 

   fi

 

done

 

echo ${#FPSs[@]}

 

if [ "${#FPSs[@]}" -eq "0" ]

 

then

 

   echo -e "$HOSTt$SERVICEt$SUCCESSt$SUCCESS_DESPn" >; $OUT_PATH

 

else

 

   FAIL_DESP="${#FPSs[@]} processes are not running:"

 

   for fps in "${FPSs[@]}"

 

   do

 

       FAIL_DESP="$FAIL_DESP $fps "

 

   done

 

   echo -e "$HOSTt$SERVICEt$FAILt$FAIL_DESPn" >; $OUT_PATH

 

fi

 

$NSCA_PATH localhost -p 5667 -c $CONFIG_PATH <; $OUT_PATH

If we check  the out.txt, the following are the things sent to the nsca server,

TestMachine testMessage 0 All required processes are running

The fields are separated by tab (t), they’re hostname, service name, status (0 for success, 1 for warning and 2 for critical), and a description message. The line is ended by a newline character.

4. Others

For debugging,

tail -f /var/log/syslog

For starting, stopping and restarting nagios3:

sudo service nagios3 start

sudo service nagios3 stop

sudo service nagios3 restart

For reloading config files,

sudo service nagios3 reload

References:

1. NSCA: http://sourceforge.net/projects/nagios/files/

2. Nagios Installation guide.

3. Nagios NSCA Installation guide.