Setting up Automatic Alerting in Your Unix Environment

by Marion Bates <mbates at whoopis.com>
with much help from William Stearns <wstearns at pobox.com>
Latest revision: January 26, 2001

Introduction

System administrators are often faced with the burden of watching their networks for potential security breaches and other critical events. Usually this is achieved simply by analyzing log data on a regular basis, but what happens after hours can be a real problem. No one can watch the logs 24 hours a day.

Administrators need a way to know if an important event has occurred, and they need to know immediately. An automated monitoring system would fulfill this need.

Automatic alerting options can include:
- Send e-mail
- Send a message to the admin's cellular phone (if service and phone include text messaging capability)
- Send a message or code to the admin's numeric pager

What you need to get started:

- Syslogd -- handle logging of traffic
- Swatch (Simple Watcher) -- parses logs and takes actions based on log content
- Chat -- a program included with most *nix distributions to dial a modem
- Modem -- for calling a pager (this is a good use for that old 14.4K)
- NTP (Network Time Protocol) -- not required but recommended for time synchronization across the network

Syslogd

Syslogd, included in all distributions, allows an administrator to not only log traffic on the local host, but also to have a centralized logging server for the whole network, which is ideal for using Swatch. Such a server can be set up by adding "*.* @ip_of_logging_host" in the syslog.conf file on each host to be logged, and specifying the "-r" option (for "receive") when calling syslogd on the logging server. The end result is that each host logs its traffic to its own log directory, as well as to the logging server. You can test the configuration by entering "logger x" at the command line of each host, which should write x to the log. A more detailed explanation of the syslog setup, along with examples, is included at the end of this document. Check the man pages for further details on syslog and syslogd.

NTP

A helpful accessory to this setup is to use NTP (Network Time Protocol) to synchronize time between the logging host and network clients, thus increasing the accuracy and precision of your log data. A typical NTP setup includes configuring a master server to syn with two off-site NTP servers, then other local hosts feed off of that. You'll need to open the NTP port (UDP/123). See <http://www.tcp-udp.net/NtpUsage/> for details.

Swatch

Swatch is a program which can actively monitor log messages as they are written to file via the UNIX syslog utility. Swatch consists of a configuration file, a library of actions, and the controlling program. It can be configured to watch for user-specified patterns and take action based on the occurrence of those patterns (e.g., send email to an address whenever the system is rebooted). This capability makes Swatch an attractive tool for 24-hour security monitoring.

Installation from source requires several Perl libraries you may not have. CPAN should automatically retrieve these for you, and usually you can use all the default or "I don't know" settings CPAN prompts for. However, if you run into problems, you can manually retrieve them using ftp -- just read the output from CPAN's failed attempt and you will see the URL you need. Or go to <http://www.cpan.org>.

Swatch can be configured to execute and run in the background at startup. To achieve this, add a line to rc.local:

/usr/bin/swatch -c /etc/swatch.conf - t /var/log/messages &

If you wish to have Swatch monitor multiple files (besides var/log/messages) then simply run multiple instances of Swatch on those files.

All of its configurations are stored in a single configuration file which you specify with the -c flag (in this example, the file is "swatch.conf" and it lives in /etc). That config file is where the administrator defines patterns to flag and what to do in each case. The -t flag tells Swatch which file it should tail (monitor).

Swatch's syntax is easy to use, yet provides a great deal of flexibility with regard to the kinds of actions that can be taken. For example, Swatch can execute shell commands preceded by the keyword exec in swatch.conf, in addition to the built-in commands it knows. Full documentation on Swatch, as well as a link to a download location, is available from <http://www.stanford.edu/~atkins/swatch/>.

Some important Swatch switches, listed alphabetically. See the sample config files for specific usage examples:

bell
Make the console beep. You can change the number of beeps; default is 1.

continue
If multiple watchfor sections contain a pattern which could occur in the same single log entry, then "continue" will tell Swatch to continue comparing its key patterns to the log entry even after it finds the first match. Take for example a watchfor section that contains both /root/ and /login/, and the log line "root: login successful" is generated. Swatch would normally just execute actions when it saw the first match -- "root" -- and then stop parsing that line of the log, thus effectively ignoring the occurrence of "login" in this case. "Continue" prevents this.

echo
Dump flagged log entries to the console.

exec
Execute whatever shell command follows. See section on chat for examples.

ignore
We can tell Swatch to ignore the things we don't care about. The syntax for this is ignore /whatever/ where "whatever" is the pattern in question. The command and the pattern should be separated by spaces or tabs. Separate each item with the pipe character |.

mail
Send email message to any number of addresses, separated by colons. For our convenience, we can make the subject line reflect the items flagged (see sample config file for syntax). The email message body will, by default, contain the full line of the log entry that triggered Swatch.

throttle
The throttle command keeps the action (whatever follows on the next line) from being executed too many times at once, which might happen if the event in question generates multiple lines in the log, which each contain the string we're watching for. So, we use throttle to suppress subsequent reportings of the same event for a specified period of time after the first instance of the event. The format is throttle HH:MM:SS (hours, minutes, and seconds) and the use=regex option tells Swatch to use the pattern specified in the watchfor line, as opposed to using the message body itself, which is the default. For example, the log messages: "sshd2[PID]: Local disconnected: Connection closed." and "sshd2[PID]: connection lost: 'Connection closed.'" use slightly different wording, so the default throttle settings would not work here. But if we include use=regex, then sshd2 is the determining string, and thus throttle will work correctly.

watchfor
This is what we use to specify which log entries to watch for, and subsequently what to do in the instance of such an entry. Separate each item with the pipe character |.

A sample Swatch configuration file:

Expect to go back and tweak the config file a few times when you first begin using Swatch. You'll find that some patterns you specify in the watchfor sections will appear elsewhere in subtle ways, resulting in unintended flagging by Swatch. For example, if you specify "watchfor /su/" intending to flag the su command (someone trying to switch to another user) then Swatch will flag that, but will also trigger on any log line that contains the word "succeeded" (which appears in the event of a successful, legitimate login!) Try using "watchfor /root/" instead. Remember that you can test to see if Swatch is listening and behaving properly by using the "logger x" command.

# Sample Swatch configuration file for constant monitoring.
ignore /news/

watchfor /restart|panic|halt/ 
	bell 
	mail=admin@foo.bar:other_admin@bar.foo,subject=Log_Data_Crash

watchfor /SSH|sshd2/ 			# watch for both the session and the daemon
	echo 
	bell 
	throttle 0:2:0,use=regex 	# in case of multiple attempts in rapid succession; 
				 	# ssh also generates multiple log lines even for a
					# single successful login
	mail=admin@foo.bar,subject=Log_Data_SSH

watchfor /blah|blah/ etc... 

# End of script (a more detailed sample config file is included at the end of this document)

You can separate different events and have Swatch perform different alerting routines, based on severity, type, etc.. For example, server problems like reboots, kernel panics, etc. could be set up to result in an email with subject line "LogData-Server" while suspicious login attempts could have subject line "LogData-AuthFail," etc. See the complete configuration example at the end of this document for more examples. Breaking things down like this makes it easier for you to sort the emails generated by Swatch, and you can tell from one glance at your inbox what kinds of things have been happening on your network.

A good idea is to have redundant alert methods available to Swatch, especially for the more critical events. For example, specify multiple email addresses (preferably belonging to multiple people who can find you) in the more important watchfor sections.

Cellular Phone Text Messaging

Another useful method, if it's available, is to use cellular phone text messaging. With this setup, Swatch can send a short message (some 100 characters or so) to an email address, and this message will be relayed to your cell phone. Assuming the phone and your provider have this feature enabled, the text of the message will appear on the phone's display. The end result is that Swatch can make your cell phone ring, then show you the flagged log data right on the phone. (We have found that log messages fit well within the character limit of this service.) One tip -- edit /etc/hosts such that there is a short entry for each host on your network to shorten the total length of the log entry, e.g., alias "mail.yourdomain.net" to just "mail." The text that appears will be the log entry (unless you set "use=regex," see Swatch entry on "throttle" for details).

Ideally, you want some method(s) besides email for Swatch to contact you, in case the critical event happens to affect the machine's ability to send email. For example, you could have Swatch page you. Which brings us to...

Chat

It is easy to have Swatch trigger a modem to dial a pager with incident-specific pager codes for emergency situations. The authors, in their first paper on Swatch, make reference to an included call_pager script for this very purpose, but we found that a) this script was not included in the most recent Swatch distribution, and b) it makes use of an application which was not part of the Linux distribution we were using. So we needed to develop an alternate way.

We used the chat program, which exists mainly for negotiating PPP connections, but it works just fine for this purpose. It is recommended that you have minicom (or the equivalent) installed for modem testing and troubleshooting during the initial setup.

The chat script should consist of one or more "expect-send" pairs of strings. For more info, read the man page. Here's the script we used:

ABORT BUSY # These are special cases; if we get
ABORT 'NO CARRIER' # a busy signal or no carrier, abort the script.

'' ATH # '' is two single quotes with nothing in-
# between -- an empty string. This line tells
# chat to expect nothing, and tells it to send
# the string ATH (hangup) to the modem.

OK ATZ # Tells chat that if it gets an OK from the
# modem (from our hangup signal), it should then
# reset the modem.

OK ATS0=0 # This tells the modem not to answer if someone calls
the modem. # We don't want a nasty back door into the network.

OK ATDT5551212,,,,,,,\T # If chat gets another OK, it then
# sends the string ATDT (pick up and start
# dialing) and the pager number (555-1212 in
# this example) to the modem.

# Each comma after the number represents a two-second pause after dialing. The number of
# commas you need will vary depending on how quickly your pager service answers the call, the
# duration of the greeting message, etc. but the idea here is to give it time to get ready to
# receive your pager code. The \T is a special variable in chat, whose value is received from the
# command you use to run the script -- see below.

# continued...

TIMEOUT 30 # Tells chat to quit after 30 seconds. This gives
# it plenty of time to finish the call to the pager.

CONNECT # Since we're not doing PPP negotiation, this is sort
# of unnecessary, but chat wants to see it in order to
# get through the whole script. Since
# there's nothing else in the script, the modem
# will send scary "alarm-failed" messages to the console, but the
# end result is that the script terminates and the
# modem resets and is ready for the next time.

That's the chat script. Normally, we would run this from the shell, using a command like the following -- to have Swatch do it for you, simply precede this with the "exec" command in swatch.conf:

chat -f /path_to_script/name_of_chat_script.txt -v -s -T xxx < /dev/modem > /dev/modem

An explanation of the syntax is as follows:

-f
File. In other words, run chat using the script you specify here.

-v
Verbose mode. From the man page: "The chat program will then log the execution state of the chat script as well as all text received from the modem and the output strings sent to the modem. The default is to log through the SYSLOG; the logging method may be altered with the -S and -s flags." Don't set up swatch to flag chat's log entries or you will go to Self-Referential Hell.

-s
Use stderr. All log messages from '-v' and all error messages will be sent to stderr. Obviously, the -v and -s flags are not necessary to the function of the script; they merely aid in debugging, as well as indicating to whoever happens to be in front of the console why the modem suddenly started dialing all by itself. ;)

-T
The variable you will pass to chat. In this context, xxx should be the pager code you want to use for that event.

< /dev/modem > /dev/modem
Establish a data stream to and from the modem.

Conclusion:

In the end, the whole system functions like so:

A centralized syslog server logs activity from all hosts, including itself.
Time is synchronized between the server and clients using NTP, so logs are more accurate.
Swatch runs on the syslog server, constantly monitoring whichever log file(s) you have specified.
If the messages you have asked Swatch to flag appear in the log(s), Swatch takes action to alert you, which can include any and all of the following:
- email you and/or someone else
- email cell phone(s)
- page you and/or someone else, using chat and a modem
- make the server beep
- display messages to the console
- execute a shell command
You receive and can act on these alerts no matter where you are and when they happen. Theoretically. ;)

Swatch is an invaluable tool for administrators who need to be aware of what's happening on their network 24 hours a day. It is easy to use and flexible enough to fit just about any network environment.

Appendix
I. More detailed swatch.conf example script

# Swatch configuration file for constant monitoring

ignore /news/

# Server problems -- we consider this to be relatively low-priority, so we just get email.
watchfor /restart|panic|halt/
	bell
	mail=admin@foo.bar,subject=Log_Data_Server_Info

# SSH stuff -- also low-priority, but we still want to know when users are ssh'd in. 
watchfor /SSH|sshd2/
	mail=admin@foo.bar,subject=Log_Data_SSH

# We want to know if someone tries to SU root -- does everything (emails multiple people, 
# emails cell phone, pages Admin with pager code 911)
watchfor /root/
	bell
	mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_ROOT
	throttle 0:2:0,use=regex 	# so it doesn't confuse the modem and interrupt paging
	exec chat -f /etc/chatscript.txt -v -s -T 911 < /dev/modem > /dev/modem

# router problems - does everything (emails multiple people, emails cell phone, pages Admin 
# with pager code 444)
watchfor /router/
	bell 
	mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Router
	throttle 0:2:0,use=regex
	exec chat -f /etc/chatscript.txt -v -s -T 444 < /dev/modem > /dev/modem

# If adduser is run, could be critical -- does everything (emails multiple people, emails cell 
# phone, pages Admin with pager code 666)
watchfor /addgrp|adduser/
	bell
	mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Adduser
	throttle 0:2:0,use=regex
	exec chat -f /etc/chatscript.txt -v -s -T 666 < /dev/modem > /dev/modem

# Bad login attempts -- mails us and pages Admin with pager code 888
watchfor /authentication failure/
	bell
	mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Bad_Login
	throttle 0:2:0,use=regex
	exec chat -f /etc/chatscript.txt -v -s -T 888 < /dev/modem > /dev/modem

# end of swatch.conf

II. Syslogd -- crash course in setting up a centralized logging server

Server setup:

Example "syslog.conf" file on the syslog server (i.e. the server receiving log data from all the other machines on the network)

This example is unchanged from the default file that's there to begin with. It lives in /etc. You can tweak it to split up your logs differently depending on your particular preferences; see the man page.

# example syslog.conf file 

# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* 					      /dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;news.none;authpriv.none            /var/log/messages

# The authpriv file has restricted access.
authpriv.* 				             /var/log/secure

# Log all the mail messages in one place.
mail.* 					              /var/log/maillog

# Everybody gets emergency messages, plus log them on another
# machine.
*.emerg 					      *

# Save mail and news errors of level err and higher in a
# special file.
uucp,news.crit 		                         /var/log/spooler

# Save boot messages also to boot.log
local7.* 				            /var/log/boot.log

# INN
news.=crit 			                  /var/log/news/news.crit
news.=err 			                  /var/log/news/news.err
news.notice 			                /var/log/news/news.notice

# end of syslog.conf

When you start up syslogd on the server, call it with the -r flag (for "receive") like so:

syslogd -m 30 -r

(-m 30 means put a mark in the logfile every 30 minutes for time sync purposes.)

To make this happen every startup, edit the rc call to syslog. Under Linux, open /etc/rc.d/init.d/syslog and find the line where syslog gets called (the default is "daemon syslogd -m 0") and change it to the line above.

Now the server is listening for log data from all clients who have been told to send their logs to the server. See below...

Client setup:

"syslog.conf" on the client machines is unchanged except for the added line at the top:

*.* @ip_of_logging_host

This tells the client to send its logs to the collecting server, as well as log them locally. Redundancy is good. Remember to restart the server and clients after making these changes.

To make logging happen on startup, each machine needs an rc call to syslog. This should be default. You may want to change the mark setting, however.

Login as root and do a "which syslogd". This will tell you if it's on the system. Usually its in /sbin. If it is on the system, do a "ps ax" (Linux) or "ps e" (Sun) to see if it's already running.

References:

Bill Stearns...and I can't remember anything else.