Setting up Automatic Alerting in Your Unix Environment
by
Marion Bates <mbates at whoopis.com>
with much help from William Stearns <wstearns at pobox.com>
Latest revision: January 26, 2001
Introduction
System administrators are often faced with the burden of watching their networks
for potential security breaches and other critical events. Usually this is achieved
simply by analyzing log data on a regular basis, but what happens after hours
can be a real problem. No one can watch the logs 24 hours a day.
Administrators need a way to know if an important event has occurred, and they
need to know immediately. An automated monitoring system would fulfill this need.
Automatic alerting options can include:
- Send e-mail
- Send a message to the admin's cellular phone (if service and phone include
text messaging capability)
- Send a message or code to the admin's numeric pager
What you need to get started:
- Syslogd -- handle logging of traffic
- Swatch (Simple Watcher) -- parses logs and takes actions based on log content
- Chat -- a program included with most *nix distributions to dial a modem
- Modem -- for calling a pager (this is a good use for that old 14.4K)
- NTP (Network Time Protocol) -- not required but recommended for time synchronization
across the network
Syslogd
Syslogd, included in all distributions, allows an administrator to not only log
traffic on the local host, but also to have a centralized logging server for
the whole network, which is ideal for using Swatch. Such a server can be set
up by adding "*.* @ip_of_logging_host" in the syslog.conf file on each
host to be logged, and specifying the "-r" option (for "receive")
when calling syslogd on the logging server. The end result is that each host
logs its traffic to its own log directory, as well as to the logging server.
You can test the configuration by entering "logger x" at the command
line of each host, which should write x to the log. A more detailed explanation
of the syslog setup, along with examples, is included at the end of this document.
Check the man pages for further details on syslog and syslogd.
NTP
A helpful accessory to this setup is to use NTP (Network Time Protocol) to synchronize
time between the logging host and network clients, thus increasing the accuracy
and precision of your log data. A typical NTP setup includes configuring a master
server to syn with two off-site NTP servers, then other local hosts feed off
of that. You'll need to open the NTP port (UDP/123). See <http://www.tcp-udp.net/NtpUsage/> for
details.
Swatch
Swatch is a program which can actively monitor log messages as they are written
to file via the UNIX syslog utility. Swatch consists of a configuration file,
a library of actions, and the controlling program. It can be configured to watch
for user-specified patterns and take action based on the occurrence of those
patterns (e.g., send email to an address whenever the system is rebooted). This
capability makes Swatch an attractive tool for 24-hour security monitoring.
Installation from source requires several Perl libraries you may not have. CPAN
should automatically retrieve these for you, and usually you can use all the
default or "I don't know" settings CPAN prompts for. However, if you
run into problems, you can manually retrieve them using ftp -- just read the
output from CPAN's failed attempt and you will see the URL you need. Or go to <http://www.cpan.org>.
Swatch can be configured to execute and run in the background at startup. To
achieve this, add a line to rc.local:
/usr/bin/swatch -c /etc/swatch.conf - t /var/log/messages &
If you wish to have Swatch
monitor multiple files (besides var/log/messages)
then simply run multiple instances of Swatch on
those files.
All of its configurations are stored in a single
configuration file which you specify with the -c
flag (in this example, the file is "swatch.conf" and
it lives in /etc). That config file is where the administrator defines patterns
to flag and what to do in each case. The -t flag tells Swatch which file it should
tail (monitor).
Swatch's syntax is easy to use, yet provides a great deal of flexibility with
regard to the kinds of actions that can be taken. For example, Swatch can execute
shell commands preceded by the keyword exec in swatch.conf, in addition to the
built-in commands it knows. Full documentation on Swatch, as well as a link to
a download location, is available from <http://www.stanford.edu/~atkins/swatch/>.
Some important Swatch switches, listed alphabetically. See the sample config
files for specific usage examples:
bell
Make the console beep. You can change the number of beeps; default is 1.
continue
If multiple watchfor sections contain a pattern which could occur in the same
single log entry, then "continue" will tell Swatch to continue comparing
its key patterns to the log entry even after it finds the first match. Take for
example a watchfor section that contains both /root/ and /login/, and the log
line "root: login successful" is generated. Swatch would normally just
execute actions when it saw the first match -- "root" -- and then stop
parsing that line of the log, thus effectively ignoring the occurrence of "login" in
this case. "Continue" prevents this.
echo
Dump flagged log entries to the console.
exec
Execute whatever shell command follows. See section on chat for examples.
ignore
We can tell Swatch to ignore the things we don't care about. The syntax for this
is ignore /whatever/ where "whatever" is the pattern in question. The
command and the pattern should be separated by spaces or tabs. Separate each
item with the pipe character |.
mail
Send email message to any number of addresses, separated by colons. For our convenience,
we can make the subject line reflect the items flagged (see sample config file
for syntax). The email message body will, by default, contain the full line of
the log entry that triggered Swatch.
throttle
The throttle command keeps the action (whatever follows on the next line) from
being executed too many times at once, which might happen if the event in question
generates multiple lines in the log, which each contain the string we're watching
for. So, we use throttle to suppress subsequent reportings of the same event
for a specified period of time after the first instance of the event. The format
is throttle HH:MM:SS (hours, minutes, and seconds) and the use=regex option tells
Swatch to use the pattern specified in the watchfor line, as opposed to using
the message body itself, which is the default. For example, the log messages: "sshd2[PID]:
Local disconnected: Connection closed." and "sshd2[PID]: connection
lost: 'Connection closed.'" use slightly different wording, so the default
throttle settings would not work here. But if we include use=regex, then sshd2
is the determining string, and thus throttle will work correctly.
watchfor
This is what we use to specify which log entries to watch for, and subsequently
what to do in the instance of such an entry. Separate each item with the pipe
character |.
A sample Swatch configuration file:
Expect to go back and tweak the config file a few times when you first begin
using Swatch. You'll find that some patterns you specify in the watchfor sections
will appear elsewhere in subtle ways, resulting in unintended flagging by Swatch.
For example, if you specify "watchfor /su/" intending to flag the su
command (someone trying to switch to another user) then Swatch will flag that,
but will also trigger on any log line that contains the word "succeeded" (which
appears in the event of a successful, legitimate login!) Try using "watchfor
/root/" instead. Remember that you can test to see if Swatch is listening
and behaving properly by using the "logger x" command.
# Sample Swatch configuration file for constant monitoring.
ignore /news/
watchfor /restart|panic|halt/
bell
mail=admin@foo.bar:other_admin@bar.foo,subject=Log_Data_Crash
watchfor /SSH|sshd2/ # watch for both the session and the daemon
echo
bell
throttle 0:2:0,use=regex # in case of multiple attempts in rapid succession;
# ssh also generates multiple log lines even for a
# single successful login
mail=admin@foo.bar,subject=Log_Data_SSH
watchfor /blah|blah/ etc...
# End of script (a more detailed sample config file is included at the end of this document)
You can separate different events and have Swatch
perform different alerting routines, based on severity,
type, etc.. For example, server
problems like reboots, kernel panics, etc. could be set up to result in an email
with subject line "LogData-Server" while suspicious login attempts
could have subject line "LogData-AuthFail," etc. See the complete configuration
example at the end of this document for more examples. Breaking things down like
this makes it easier for you to sort the emails generated by Swatch, and you
can tell from one glance at your inbox what kinds of things have been happening
on your network.
A good idea is to have redundant alert methods available to Swatch, especially
for the more critical events. For example, specify multiple email addresses (preferably
belonging to multiple people who can find you) in the more important watchfor
sections.
Cellular Phone Text Messaging
Another useful method, if it's available, is to use cellular phone text messaging.
With this setup, Swatch can send a short message (some 100 characters or so)
to an email address, and this message will be relayed to your cell phone. Assuming
the phone and your provider have this feature enabled, the text of the message
will appear on the phone's display. The end result is that Swatch can make your
cell phone ring, then show you the flagged log data right on the phone. (We have
found that log messages fit well within the character limit of this service.)
One tip -- edit /etc/hosts such that there is a short entry for each host on
your network to shorten the total length of the log entry, e.g., alias "mail.yourdomain.net" to
just "mail." The text that appears will be the log entry (unless you
set "use=regex," see Swatch entry on "throttle" for details).
Ideally, you want some method(s) besides email for Swatch to contact you, in
case the critical event happens to affect the machine's ability to send email.
For example, you could have Swatch page you. Which brings us to...
Chat
It is easy to have Swatch trigger a modem to dial a pager with incident-specific
pager codes for emergency situations. The authors, in their first paper on Swatch,
make reference to an included call_pager script for this very purpose, but we
found that a) this script was not included in the most recent Swatch distribution,
and b) it makes use of an application which was not part of the Linux distribution
we were using. So we needed to develop an alternate way.
We used the chat program, which exists mainly for negotiating PPP connections,
but it works just fine for this purpose. It is recommended that you have minicom
(or the equivalent) installed for modem testing and troubleshooting during the
initial setup.
The chat script should consist of one or more "expect-send" pairs of
strings. For more info, read the man page. Here's the script we used:
ABORT BUSY # These are special cases; if we get
ABORT 'NO CARRIER' # a busy signal or no carrier, abort the script.
'' ATH # '' is two single quotes with nothing in-
# between -- an empty string. This line tells
# chat to expect nothing, and tells it to send
# the string ATH (hangup) to the modem.
OK ATZ # Tells chat that if it gets an OK from the
# modem (from our hangup signal), it should then
# reset the modem.
OK ATS0=0 # This tells the modem not to answer if someone calls
the modem. # We don't want a nasty back door into the network.
OK ATDT5551212,,,,,,,\T # If chat gets another OK, it then
# sends the string ATDT (pick up and start
# dialing) and the pager number (555-1212 in
# this example) to the modem.
# Each comma after the number represents a two-second pause after dialing. The number of
# commas you need will vary depending on how quickly your pager service answers the call, the
# duration of the greeting message, etc. but the idea here is to give it time to get ready to
# receive your pager code. The \T is a special variable in chat, whose value is received from the
# command you use to run the script -- see below.
# continued...
TIMEOUT 30 # Tells chat to quit after 30 seconds. This gives
# it plenty of time to finish the call to the pager.
CONNECT # Since we're not doing PPP negotiation, this is sort
# of unnecessary, but chat wants to see it in order to
# get through the whole script. Since
# there's nothing else in the script, the modem
# will send scary "alarm-failed" messages to the console, but the
# end result is that the script terminates and the
# modem resets and is ready for the next time.
That's the chat script. Normally, we would run
this from the shell, using a command like the following
-- to have Swatch do it for you,
simply precede this with the "exec" command in swatch.conf:
chat -f /path_to_script/name_of_chat_script.txt -v -s -T xxx < /dev/modem > /dev/modem
An explanation of the syntax is as follows:
-f
File. In other words, run chat using the script you specify
here.
-v
Verbose mode. From the man page: "The
chat program will then log the execution state of the chat script
as well as all text received from the modem and the output strings
sent to the modem. The default is to log through the SYSLOG; the
logging method may be altered with the -S and -s flags." Don't
set up swatch to flag chat's log entries or you will go to Self-Referential
Hell.
-s
Use stderr. All log messages from '-v' and
all error messages will be sent to stderr. Obviously, the -v and
-s flags are not necessary to the function of the script; they merely
aid in debugging, as well as indicating to whoever happens to be
in front of the console why the modem suddenly started dialing all
by itself. ;)
-T
The variable you will pass to chat. In this
context, xxx should be the pager code you want to use for that event.
< /dev/modem > /dev/modem
Establish a data stream to and from the modem.
Conclusion:
In the end, the whole system functions like so:
- A centralized syslog server logs activity
from all hosts, including itself.
- Time is synchronized between the server
and clients using NTP, so logs are more accurate.
- Swatch runs on the syslog server, constantly
monitoring whichever log file(s) you have specified.
- If the messages you have asked Swatch
to flag appear in the log(s), Swatch takes action to alert
you, which can include any and all of the following:
- email you and/or someone else
- email cell phone(s)
- page you and/or someone else, using
chat and a modem
- make the server beep
- display messages to the console
- execute a shell command
- You receive and can act on these alerts
no matter where you are and when they happen. Theoretically.
;)
Swatch is an invaluable tool for administrators
who need to be aware of what's happening on their network 24 hours
a day. It is easy to use and flexible enough to fit just about any
network environment.
Appendix
I. More detailed swatch.conf example script
# Swatch configuration file for constant monitoring
ignore /news/
# Server problems -- we consider this to be relatively low-priority, so we just get email.
watchfor /restart|panic|halt/
bell
mail=admin@foo.bar,subject=Log_Data_Server_Info
# SSH stuff -- also low-priority, but we still want to know when users are ssh'd in.
watchfor /SSH|sshd2/
mail=admin@foo.bar,subject=Log_Data_SSH
# We want to know if someone tries to SU root -- does everything (emails multiple people,
# emails cell phone, pages Admin with pager code 911)
watchfor /root/
bell
mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_ROOT
throttle 0:2:0,use=regex # so it doesn't confuse the modem and interrupt paging
exec chat -f /etc/chatscript.txt -v -s -T 911 < /dev/modem > /dev/modem
# router problems - does everything (emails multiple people, emails cell phone, pages Admin
# with pager code 444)
watchfor /router/
bell
mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Router
throttle 0:2:0,use=regex
exec chat -f /etc/chatscript.txt -v -s -T 444 < /dev/modem > /dev/modem
# If adduser is run, could be critical -- does everything (emails multiple people, emails cell
# phone, pages Admin with pager code 666)
watchfor /addgrp|adduser/
bell
mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Adduser
throttle 0:2:0,use=regex
exec chat -f /etc/chatscript.txt -v -s -T 666 < /dev/modem > /dev/modem
# Bad login attempts -- mails us and pages Admin with pager code 888
watchfor /authentication failure/
bell
mail=admin@foo.bar:other_admin@bar.foo:5551212@cell_messaging.com,subject=Log_Data_Bad_Login
throttle 0:2:0,use=regex
exec chat -f /etc/chatscript.txt -v -s -T 888 < /dev/modem > /dev/modem
# end of swatch.conf
II. Syslogd -- crash course in setting up
a centralized logging server
Server setup:
Example "syslog.conf" file on the syslog server (i.e. the server receiving
log data from all the other machines on the network)
This example is unchanged from the default file that's there to begin with. It
lives in /etc. You can tweak it to split up your logs differently depending on
your particular preferences; see the man page.
# example syslog.conf file
# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.* /dev/console
# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;news.none;authpriv.none /var/log/messages
# The authpriv file has restricted access.
authpriv.* /var/log/secure
# Log all the mail messages in one place.
mail.* /var/log/maillog
# Everybody gets emergency messages, plus log them on another
# machine.
*.emerg *
# Save mail and news errors of level err and higher in a
# special file.
uucp,news.crit /var/log/spooler
# Save boot messages also to boot.log
local7.* /var/log/boot.log
# INN
news.=crit /var/log/news/news.crit
news.=err /var/log/news/news.err
news.notice /var/log/news/news.notice
# end of syslog.conf
When you start up syslogd on the server, call it
with the
-r flag (for "receive") like so:
syslogd -m 30 -r
(-m 30 means put a mark in the logfile every 30 minutes
for time sync purposes.)
To make this happen every startup, edit the rc call to syslog. Under
Linux, open /etc/rc.d/init.d/syslog and find the line where syslog
gets called (the default
is "daemon syslogd -m 0") and change it to the line above.
Now the server is listening for log data from all clients who have been told
to send their logs to the server. See below...
Client setup:
"syslog.conf" on the client machines is unchanged except for the added
line at the top:
*.* @ip_of_logging_host
This tells the client to send its logs to the collecting
server, as well as log them locally. Redundancy is good. Remember to restart
the server and clients after making these changes.
To make logging happen on startup, each machine needs an rc call to syslog. This
should be default. You may want to change the mark setting, however.
Login as root and do a "which syslogd". This will tell you if it's
on the system. Usually its in /sbin. If it is on the system, do a "ps ax" (Linux)
or "ps e" (Sun) to see if it's already running.
References:
- Bill Stearns...and I can't remember anything else.