Frequently Asked Questions (FAQs)


Index

Items in red seem to be the biggest problems for people - read these first..

Problems compiling NetSaint
Problems compiling the statusmap CGI
"NetSaint process may not be running" warnings in the CGIs
Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN"
When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data

Debugging "unknown variable" errors during configuration verification or runtime
Running multiple instances of NetSaint on the same machine
Changing the contents of the default web page
Missing data in the CGIs or errors about improper authorization
Problems finding the traceroute CGI
Requiring users to authenticate before accessing web interface
Displaying pretty host icons
Errors commiting commands via the command CGI
Monitoring virtual web servers that use host headers
Monitoring remote host information
Monitoring printers
Monitoring Windows NT servers
Sending SNMP traps to management hosts
Logging events to an external database
Troubleshooting problems with NetSaint

I'm having trouble compiling Netsaint - What can I do?

If you are running Linux, this is probably because you don't have the gcc compiler installed on your system. Either install the compiler yourself or ask your sysadmin to do it for you. If you are running SunOS, IRIX, HP-UX, *BSD, etc. make have to tweak the Makefile a bit. This may involve changing the compiler name, compiler options, and/or linker options.

If you're getting errors about the strncat(), strncpy(), or snprintf() functions, you probably don't have the glibc libraries installed on your system. This tends to happen most often on HP-UX and Solaris boxes. I've tried to prevent potential buffer overflows in NetSaint and the CGIs by using these functions, so they are all over the code. If you don't want to install the glibc libraries for some reason, you'll have to find some other way to get everything compiled.

If you have to make changes to the Makefile, configure script, or any code in order to compile NetSaint, let me know what OS you are running and what changes you had to make. I would like to include this information in future releases.


I can't find or am having trouble compiling the statusmap CGI...

If you compile all the CGIs, but don't find the statusmap CGI, you probably don't have Thomas Boutell's gd library installed correctly on your system. The gd library (and thus the statusmap CGI) also requires that you also have the zlib and png libraries installed. Version 1.6.3 or higher of the gd library is required, as the CGI generates a PNG image of your network layout.

If you find that the statusmap CGI has not been compiled, make sure you have the gd library installed on your system and rerun the configure script with the following options:

./configure --with-gd-lib=LIBDIR --with-gd-inc=INCDIR

Replace LIBDIR with the directory in which the gd library is installed (usually /usr/lib or /usr/local/lib) and replace INCDIR with the directory in which the header files for the gd library are installed (usually /usr/include or /usr/local/include).

After you rerun the configure script, make sure to recompile the CGIs and install them in their proper location.


"NetSaint process may not be running" warnings in the CGIs

If you are getting erroneous messages about the NetSaint process not running while viewing the CGIs, its probably due to one of the following items:

  1. You haven't defined a command to check the status of the NetSaint process. This is done by supplying a value for the process_check_command directive in the CGI configuration file.

  2. If you have defined a command, perhaps it is not returning the proper exit code. The command must follow the same rules as the plugin: a return code of 0 indicates that NetSaint is running, values of 1, 2, or -1 indicate that NetSaint is either not running or in some degraded state.

  3. If you have defined a process check command that uses the check_netsaint plugin, make sure that the plugin is functioning as it should. Execute the check_netsaint plugin from the command line and check the result. If the plugin is reporting that the NetSaint process cannot be found or if it returns a "Could not open pipe" message, you may need to edit the PS_RAW_COMMAND definition in the common/config.h file of the plugin distribution to match the syntax for the ps command on your system. For example, under FreeBSD you should use either "/bin/ps -ao 'state user ppid args'" or "/bin/ps -axo 'state user ppid command'" (it seems to vary). Once you've changed the PS_RAW_COMMAND definition, recompile the plugins and test the newly compiled check_netsaint plugin to see if it works.

The CGIs will not allow you to sumbit any commands while they think the NetSaint process is not running. This is done primarily to prevent people from accidentally submitting multiple shutdown/restart commands that don't get processed until NetSaint is started at some future time.


Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN"

This seems to be one of the biggest issues for new users. 99.9% of the time this problem is due to an incorrect command definition for the host check command you specified in the host definition.

A major cause for this problem was due to a syntax change to the command line arguments of the check_ping plugin. You need to make sure that the host check command is using the proper syntax for the version of the check_ping plugin that you have. You can check to see if the command works properly by executing it manually from the command line. Recent versions of the check_ping plugin require that a -p flag be used to specify the number of packets to send. Previous versions of the plugin did not require this flag - that's where the problem lies. Check your host check command definition(s) to make sure they are using the proper syntax. Example:

command[check-host-alive]=/usr/local/netsaint/libexec/check_ping $HOSTADDRESS$ 100 100 1000.0 1000.0 -p 1

Important! Just because you have a service that is monitoring ping statistics for a host does not mean that the actual host status is being checked. The status of a host is only checked when a service check results in a non-OK state or if the host was previously down and a service check results in an OK state.

Some symptoms of incorrect host check commands include:

  1. Hosts incorrectly being listed as DOWN
  2. Services that have a status of "HOST DOWN", even though the host they reside on is actually UP
  3. Alternating alerts/notifications about host problems and recoveries


When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data

Several people have reported this problem and I spent hours trying to find the problem until I realized it wasn't a bug in the code. If you get service notifications when you should be getting host notifications (and the service notifications you get seem to contain bogus data), check your contact definitions in the host config file. They are most likely incorrect.

Make sure that you are not using the same notification command for service and host notification commands. Service and host notifications are very different and make use of macros which are not transferrable between each type. Look at the sample host config file provided with NetSaint to see what the contact definitions look like and how the service and host notification commands differ. If you're wondering what macros can be used in either type of notification, look at this table.


Debugging "unknown variable" errors during configuration file verification or runtime

When trying to run NetSaint or verify your configuration file data using the -v argument, NetSaint may print out a message like "Error in configuration file 'xxxxxxx.cfg' - Line 34 (Unknown variable)". A few simple checks will usually resolve this problem...

  1. Make sure you are passing the path to the main configuration file and not the host configuration file on the command line. Many people have made this mistake. The correct syntax would be as follows (modified for your system, of course):
    ./netsaint -v /usr/local/netsaint/etc/netsaint.cfg

  2. Make sure that you don't have any invalid variables defined in your configuration file. Notice that the error message will contain a reference to the name of the configuration file and the line number on which the error was encountered. Make sure that all comment lines contain a pound sign (#) in the first character of the line. If you're not sure about what variables are valid, check the documentation for the main and host configuration files.

  3. Make sure all variable identifiers are in lower case. Example:
    "admin_email=someaddress@somedomain.com" instead of "ADMIN_EMAIL=somedomain@nowhere.com"


How do I run multiple instances on NetSaint on the same machine?

You can run multiple instances of NetSaint on the same machine, if you ensure that the following variables are unique for each instance of NetSaint...

If you are using the web interface, you will have to setup separate directories to hold the CGIs for each instance of NetSaint and create appropriate script aliases in your web server configuration file. This is necessary because CGI configuration file must be unique for each setup of CGIs, as it contains a reference to which main configuration file the CGIs should read.

Also, if you plan on running both copies of NetSaint is daemon mode, you'll need to change the #LOCK_FILE definition in the common/locations.h file before compiling the second copy. If you don't, both copies of NetSaint will try and use the lock file. The second one that is started will complain and the exit. Version 0.0.6 will include the ability to specify the location of the lock file in the main configuration file.

One last thing you should check is your init script (if you're using one). The init script should start, stop, restart, and reload all copies of NetSaint (if that's what you want).


How do I change the contents of the default web page?

Several people have asked how to modify the default web page so that service detail or service overview information is displayed in the right hand frame (instead of the intro page). You can do this rather easily by modifying the frameset information in the index.html page (located in the root web directory for NetSaint) as follows..

Default Frame Configuration

<FRAMESET BORDER="0" FRAMEBORDER="0" FRAMESPACING="0" COLS="180,*">
<FRAME SRC="side.html" NAME="side" TARGET="main">
<FRAME SRC="main.html" NAME="main">
</FRAMESET>

Modified Configuration

<FRAMESET BORDER="0" FRAMEBORDER="0" FRAMESPACING="0" COLS="180,*">
<FRAME SRC="side.html" NAME="side" TARGET="main">
<FRAME SRC="xxxxx" NAME="main">
</FRAMESET>

Replace xxxxx with one of the following values, or anything else you may want...

Option Description
/cgi-bin/netsaint/status.cgi?host=all This will display service status details for all hosts in the right hand side of the frame
/cgi-bin/netsaint/status.cgi?hostgroup=all This will display a service status overview for all hostgroups in the right hand side of the frame
/cgi-bin/netsaint/showlog.cgi This will display the contents of the log file in the right hand side of the frame
/cgi-bin/netsaint/history.cgi?host=all This will display the service history for all hosts in the right hand side of the frame

Read the documentation on the CGIs for more information on what options each supports.


When I access the CGIs I don't see everything I should or I get authorzation errors...

If you believe you are unable to see all the information in the CGIs or if you are getting authorization errors, you probably haven't configured the web server to require authentication or haven't setup authorzation correctly. See the documentation on authentication and authorization in the CGIs here.


Where can I find the traceroute CGI?

Newer versions of the check_ping plugin are capable of producing HTML that provides a link to a traceroute CGI written by Ian Cass. The traceroute CGI is not included in the core distribution of NetSaint. However, you can find it in the contrib area of the downloads section at http://www.netsaint.org/download/contrib.


How do I requre users to authenticate before accessing the web interface?

See the documentation on authentication and authorization in the CGIs here.


How do I get those pretty pretty host icons to display in my CGIs?

If you want to associate images with particular hosts for use in the status, status map, status world, and extended information CGIs, you must define extended host information entries in your CGI configuration file.


I'm getting errors when attempting to commit commands to NetSaint via the command CGI

If you are getting 'Could not open command file somefile for update' errors when attempting to commit commands to NetSaint via the command CGI, the most likely problem is with directory and/or file permissions. Here is what you can do to fix it. Note: You must be root in order to do some of these steps...

First, find the user that your web server process is running as. On many systems this is the user nobody, although it will vary depending on what OS/distribution you are running.

Next, create a new group that will be granted permissions to update the NetSaint command file. Let's say you want to call the group 'nscmd'. On RedHat Linux you can use the following command to add a new group (other systems may differ):

/usr/sbin/groupadd nscmd

Next, add all users who should have access to the command file to the group you just created. In this example we'll just add the user nobody...

/usr/sbin/usermod -G nscmd nobody

Next, create the directory where the command file should be stored. By default, this is /usr/local/netsaint/var/rw, although it can be changed by modifying the command_file variable.

mkdir /usr/local/netsaint/var/rw

Next, change the group ownership of the directory used to hold the command file...

chown -R .nscmd /usr/local/netsaint/var/rw

Also check the group permissions on the directory. The group you created needs to have write access there. The last thing you'll have to do is restart your web server with a command similiar to the following..

/etc/rc.d/init.d/httpd restart

Apparently Apache needs to be restarted in order to inherit the new group permissions you assigned. That's it. You should be able to commit commands to NetSaint via the CGI now (assuming you have the proper authorization).

If you supplied the --with-command-grp=somegroup option when running the configure script, you can create the directory to hold the command file and set the proper permissions by running 'make install-commandmode'.


How do I monitor virtual web servers that use host headers?

If you are running a web server with multiple virtual servers and only one IP address, this applies to you. Let's say that your web server has an IP address of 192.168.0.1 and two virtual servers running on it - "www.myfirstdomain.com" and "www.myseconddomain.com". Both of these domain names resolve to the same IP address (192.168.0.1) during a DNS lookup. The check_http plugin can handle this type of situation without a problem. You will need to specify the virtual web site name as an additional command line argument to the plugin (using the -hn option). Example:

command[check_http2]=/usr/local/netsaint/check_http $HOSTADDRESS$ -u / -p 80 -hn $ARG1$

service[myhost]=First Virtual Web Server;3;2;120;1;1;1;check_http2!www.myfirstdomain.com
service[myhost]=Second Virtual Web Server;3;2;120;1;1;1;check_http2!www.myseconddomain.com

The check_http2 command defined here will use the check_http plugin to open a connection to port 80 of the host at IP address 192.168.0.1. It will then send an HTTP/1.1 request for the root document, along with either a "Host: www.myfirstdomain.com" or "Host: www.myseconddomain.com" in the request header.


How do I monitor remote host information?

Several people have asked how to use various plugins that check information on the local host to report information from remote hosts. Various methods for doing this are described below..

If you need to actually execute a plugin on a remote host and get the results back, you can use one of the following methods...

  • Use the check_by_ssh "plugin" to execute a plugin on a remote host. The check_by_ssh plugin is basically a wrapper for executing a plugin on a remote host using SSH. You must have SSH installed and configured properly in order to use this.
  • Use the nrpe addon to accomplish this. The plugins and the nrpe daemon reside on the remote host. The check_nrpe plugin (included with the nrpe package) sends a request to the nrpe daemon to execute the plugin on the remote host and then grabs the results for NetSaint.
  • Use the nrpep addon. This addon works in a similiar manner to the nrpe package, but it encrypts the transmitted data, runs as a service from inetd, and makes use of the TCP Wrappers package for access control.
  • Use rsh to execute the plugin remotely, although I guess I wouldn't recommend this..

If all you need is to check disk space, etc. on a remote host, you can use one of the methods below...

  • Use one of the plugins included with the netsaint_statd addon for NetSaint. The addon, written by Charlie Cook, includes a Perl daemon which runs on the remote host and four plugins which are used to gather the remote host information from the daemon. The daemon is designed to run on Linux, IRIX, HP-UX, SunOS, and OSF/1 systems. Modifying the code for other systems should be fairly easy. More on the netsaint_statd plugin can be found here.
  • Use the check_overcr plugin to query information from a remote host. The remote host must be running Eric Molitor's Over-CR collector in order for this to work.
  • Use the check_snmp plugin to check the value of various OIDs on the remote host. You must have SNMP services installed and running on the remote host in order to do this.


How can I monitor NT servers?

The good news is that NT has a lot of performance data that you can monitor. The bad news is that its difficult to do. Your best bet is probably going to be to install SNMP services on all your NT boxes. Ian Cass has written a FAQ on how to do this at http://elton.dev.knowledge.com/snmpfaq.html

In order to expose NT performance counters for monitoring, you'll have to run the SNMP service on all servers you want to monitor. You'll also have to install any necessary performance MIBs for the services you want to monitor. I believe these can be found in the NT Resource Kit or in various server admin packages. If you've feeling extra lucky you can try to search the Microsoft site for the terms SNMP and MIB and maybe you'll find something...

You can search the MRTG mailing list archives for more information on configuring NT servers to expose various performance counters via SNMP. I know this has been discussed in the past, as many people are graphing various NT performance statistics using MRTG. In fact, somebody from Microsoft is actually doing it - you can find their web page at http://snmpboy.rte.microsoft.com/.

Once you've actually got the SNMP stuff working, you can use the check_snmp plugin to query your NT servers and generate alarms.

A few people are looking into the possibilities of creating a service that runs under NT to facilitate easier remote monitoring. Once these efforts solidify, an announcement will be made on the NetSaint mailing lists.


How do I monitor printers?

Assuming you have HP printers with JetDirect© cards installed, you can use the HP printer plugin to monitor them. Before you begin monitoring printers you should carefully plan your configuration to match level of monitoring and response time you need. You need to balance this against the annoyance of getting alerted every time sometime takes the printer offline to manually feed a transparency, etc. A lot of admins probably don't care if the printer is jammed or is out of paper, but some tech support people in large corporations might find this to be a useful feature. Anyway, if you decide to do this you will need to do the following things:

  • Enable the TCP/IP protocol stack on the JetDirect© card and assign it an IP address. External JetDirect© devices with multiple parallel ports will need this to be done on each port that has a printer connected that you want to monitor.
  • Create a host definition entry for the printer in your config file. Set the notify_recovery, notify_down, and notify_unreachable options to 0 if you don't want NetSaint to send you alert when the printer gets turned off on and on.
  • Create a host group for the printer(s) you defined. Call it printers or something similiar.
  • Create a contact group containing all contacts that should be notified about printer problems. This group should be the notification group you specified in the printers host group you just defined.
  • Create a service to be monitored for the printer. Set the notify_critical option to 0 if you don't want to get notified when someone turns the printer off. The check_hpjd plugin returns a warning status whenever a problem is detected with a printer, so make sure the notify_warning option is set to 1 (assuming you want to the contact be notified). Also, fill in the contactgroups option with the name of the contact group you created for printers.


Can NetSaint send SNMP traps to management hosts?

Yes, but not directly. NetSaint relies on plugins to handle the gathering of service and host information and event handler scripts to handle events that occur with services and hosts. If you want to have NetSaint send an SNMP trap to a management host in the event that a particular service has a problem, you will have to write a service event handler script and add it to the event_handler option of the service definition. If you have the UCD-SNMP package installed on your host, you could have the script call the snmptrap command to actually send a trap message, depending on what type of service event occurred. Look at the example event handler script to get a better idea of how to write a script.


Can NetSaint log host and service events to an external database?

Not directly, but this can be done fairly easily. You'll probably want to define global host and service event handlers to do this. The global event handlers could call a script which inserts the appropriate event information into a database of your choosing. This would allow you to run queries and generate more detailed reports than what are available in the CGIs.


Something isn't working properly - How can I track down the problem?

I've worked in tech support for a few years and have spent my share of time on a helpdesk. Most people are vague when they report a problem and have no desire whatsoever to try and track down the problem - they just want you to fix it now. I hope you are not that type of person. NetSaint is relatively new and is probably chock full of bugs, so things will not always work properly. If you suspect that either the service check or notification routines are not working, here are a few things you can do to try and track down the problem...

This first thing you should do is verify your configuration data by running NetSaint with the -v option. Example:

./netsaint -v /usr/local/netsaint/etc/netsaint.cfg

If no errors are found, proceed to the next steps. If NetSaint reports some error, go back and fix your configuration files.

The next step will take more time, but will give you more information on what is going on inside of NetSaint. When I first developed NetSaint I added a lot of debugging code to help me track down problems. I still use that code when I add new features or track down bugs myself. Here is how to use the debugging code...

Reconfigure NetSaint and enable one or more debug options as follows, replacing the "--enable-DEBUGx" with one or more of the values from the table below:

./configure --prefix=/your/netsaint/directory --enable-DEBUGx

Debugging Options

Debug Option Description
--enable-DEBUG0 Used to trace function calls. A lot of messages will be printed out if you uncomment this option, but it very useful to trace what functions are being called. Note that not all functions will print an exit message if code within the function causes an early exit (before reaching the end of the function).
--enable-DEBUG1 Used to print out informational messages about variable settings. Most useful when trying to debug the configuration data as it is being read or verified.
--enable-DEBUG2 Used to print out warning messages, usually when configuration data is being read or verified.
--enable-DEBUG3 Used to print out informational messages during host and service checks. Good to use if you suspect problems are occuring during service checks.
--enable-DEBUG4 Used to print out informational messages during host and service notifications. Good to use if you suspect problems are occurring during the notification events.

Recompile NetSaint.

Verify your configuration data again - you'll see a lot more information this time if you have enabled the DEBUG1 option. Try redirecting output to a file so that you can view or print it at a later time.

If you have defined either the DEBUG3 or DEBUG4 options, run NetSaint as a foreground process and start monitoring your services. Example:

./netsaint /usr/local/netsaint/etc/netsaint.cfg

Kill NetSaint at an approprate point (i.e. after a service check fails) and look through the output. It should help you track down where the problem is occurring. Some code tweaking may be necessary on your part in order to fix things. Let me know if you have to make any such alterations so I can include the fix in future releases.

If you are unable to determine or fix the problem on your own, email me the following items:

  1. The version of NetSaint you are running
  2. A description of what is going wrong and what you suspect is the problem
  3. The OS you're running NetSaint on
  4. Your configuration files (netsaint.cfg and hosts.cfg)
  5. Output from the program run (with debugging options on)