What is Spread?
Spread is a toolkit for group communication developed at the Johns Hopkins Unversity Center for Networking and Distributed Systems. It is both a protocol and a communication method. To use spread, machines run a spread daemon which is awar, through it's configuration, of all other spread daemons that might be talking to it. There is a notion of 'groups', implemneted such that if machine A has joined group 'x', machine 'B' has joined group 'x', and machine 'C' has joined group 'y', then 'A' and 'B' will see each others messages but not 'C's and 'C' will see nothing but it's own. When sending messages, various methods of reliability can be used, such as unreliable (and spread the spread protocol is not tcp, so unreliable is unreliable), to reliable, to various degrees of orderedness (such as if you wanted to guarantee that if machine 'A' sends a message before machine 'B', that those messages are correctly ordered by every member of the group. More information on spread is available at the CNDS website.
What is mod_log_spread?
mod_log_spread is a module for the apache webserver which
uses spread for reliable logging to the network. Some benefits of using
spread are
It's reliable. syslog network logging is not and would result in dropped
logs. It allows for redundancy. In spread there are no priviledged hosts.
The logging host is eliminated as a single point of failure as bringing
up a new logging host is as simple as turning one on. It's flexible. Since
mod_log_spread was built on top of mod_log_config, the entire advanced
feature set of apache logging (including env masks and per-directory/per-vhost
logging) is available. Further, breaking down different services to differnet
log files is as simple as changing the group that servivce logs to.
Information/source for mod_log_spread is available here.
What is wrong with the way things were ....
The reason I wrote mod_log_spread was that a popular commercial log writing was hard to support, non-scalable, and broke frequently. The scalability concerns with it stemmed from it's basic design. The particular product I was addled with was a (java-based) packet sniffer. It sniffs for http transactions and recreates them from tcp sessions. This presents immediate scalability concerns. How do we sniff a network pushing 70Mb of traffic with a single non-clustering packet-sniffer? You don't. mod_log_spread backs up this assertion by demonstratebly recording 10-15% more traffic. Sniffers drop logs, Spread, the underlying protocol behind mod_log_spread, is designed to be unable to drop messages. This particular commercial sniffer is also a single point of failure. mod_log_spread can run two (or any number) logging hosts simultaneously with no netwrok overhead. Further it is not a black box product, mod_log_spread is an open-source project.
So why not just write logs locally?
There is a 20-30% performance hit, and you have never known pain until you have tried to manage local logging across 60 machines. Trust me.
Ok, so you've convinced me it's cool. How does it work?
Both spread and mod_log_spread have decent documentation. Check
them out. Here's a brief run down. Spread's main configuration
file is /etc/spread.conf. It looks something like:
#/etc/spread.conf from bp23This says that bp23 sees 1 spread ring, identified as having up to 120 members listening to port 7777 on multicast address 225.0.1.4. The possible members of the ring are detailed in the lines that follow. All machines that listen on this ip/port need to have this exact configuration file.
1
120 225.0.1.4 7777
bp1 192.168.1.1
bp2 192.168.1.2
bp3 192.168.1.3
bp4 192.168.1.4
bp5 192.168.1.5
bp6 192.168.1.6
bp7 192.168.1.7
bp8 192.168.1.8
bp9 192.168.1.9
bp10 192.168.1.10
bp11 192.168.1.11
bp12 192.168.1.12
bp13 192.168.1.13
bp14 192.168.1.14
bp15 192.168.1.15
bp16 192.168.1.16
bp17 192.168.1.17
bp18 192.168.1.18
....
bp119 192.168.1.119
bp120 192.168.1.120
mod_log_spread is a Apache DSO must have the log_spread.so in /web/XX/adm/libexec/
and is enabled with lines like the following in httpd.conf:
#/path/to/apache/conf/httpd.conf from bp23This tells mod_log_spread where to find the local SpreadDaemon (you can contact a remote one, but you shouldn't) and tells it to log clf logs to the group 'test'.
LoadModule spread_log_module libexec/mod_log_spread.so
AddModule mod_log_spread.c
SpreadDaemon 7777
CustomLog $test common
You can verify your configuration is working by looking in the apache error log. You should get a line like:
[Sun Jul 30 05:53:19 2000] [notice] set_spread_daemon(7777)If you get a bunch of these:
[Sun Jul 30 05:53:19 2000] [notice] Apache/1.3.9 (Unix) PHP/3.0.11 configured -- resuming normal operations
[Mon Jul 31 13:44:38 2000] [notice] SP_multicast error(-11) in config_log_transactionsomething is wrong - perhaps your spread daemon is not listening on the port you specified in your httpd.conf. You may get a few of these, especially if spread restarts. Don't worry about them.
[Mon Jul 31 13:44:44 2000] [notice] SP_multicast error(-11) in config_log_transaction
There are other tools for evaluating the health and happiness of your spread daemons as well. /usr/local/bin/user (available on most machines and binary-portable) is a command line spread client. You can use it to monitor the raw spread traffic. An example session is:
Here, we connect user to the local spread daemon (your port may vary) and join the group wwwbp to see whats going on. Then we quit. If user hangs when you invoke it, there is something bad going on. Spreadwatch.pl will take care of it for you (it will kill the spread daemon and re-invoke it.) Never under any circumstances do a 's test' - this will write trash in our logs! You can also make up your own group if you want.
[root@bp23 ~]# user -s 7777
Spread library version is 3.12
User: connected to 7777 with private group #user#bp23==========
User Menu:
----------j <group> -- join a group
l <group> -- leave a groups <group> -- send a message
b <group> -- send a burst of messagesr -- receive a message (stuck)
p -- poll for a message
e -- enable asynchonous read (default)
d -- disable asynchronous readq -- quit
User> j test
User>
============================
Received REGULAR membership for group test with 2 members, where I am member 0\
:
#user#bp23
#sld-09221#bp87
grp id is -1062731504 964738633 2
Due to the JOIN of #user#bp23User>
============================
received RELIABLE message from #ap-30445#bp26, of type 1, (endian 0) to 1 group\
s
(82 bytes): 143.231.34.236 - - [31/Jul/2000:13:52:47 -0400] "GET /Members/ HTTP\
/1.1" 200 1097....
User>
============================
received RELIABLE message from #ap-08888#bp23, of type 1, (endian 0) to 1 group\
s
(81 bytes): 205.188.198.36 - - [31/Jul/2000:13:52:48 -0400] "POST /auth.html HT\
TP/1.0" 302 0User> q
Bye.
[root@bp23 ~]#
There is also the utility /usr/local/bin/monitor. With the additional automated watchers, using this should be pretty unnecessary. I'll include a little transcript:
[root@bp23 ~]# monitor -c /etc/spread.conf -n bp23
/===========================================================================\
| The Spread Group Communication Toolkit. |
| Copyright (c) 1994-1999 Yair Amir, Michal Miskin-Amir, Jonathan Stanton. |
| All rights reserved. |
| |
| The Spread package is licensed under the Spread Non-Commercial License. |
| You may only use this software in compliance with the License. |
| A copy of the license can be found at http://www.spread.org/license |
| |
| This software is distributed on an "AS IS" basis, WITHOUT WARRANTY OF |
| ANY KIND, either express or implied. |
| |
| Spread is developed at the Center for Networking and Distributed Systems, |
| The Johns Hopkins University. |
| |
| Creators: |
| Yair Amir yairamir@cs.jhu.edu |
| Michal Miskin-Amir michal@spread.org |
| Jonathan Stanton jonathan@cs.jhu.edu |
| |
| Contributors: |
| Dan Schoenblum dansch@cnds.jhu.edu - Java Interface Developer. |
| John Schultz jschultz@cnds.jhu.edu - contribution to process group |
| membership. |
| |
| Special thanks to the following for providing ideas and/or code: |
| Ken Birman, Danny Dolev, David Shaw, Robbert VanRenesse. |
| |
| WWW : http://www.spread.org and http://www.cnds.jhu.edu |
| Contact: spread@spread.org |
| |
| Version 3.12, Built 27/Jul/1999 |
\===========================================================================/=============
Monitor Menu:
-------------
0. Activate/Deactivate Status {all, none, Proc, CR}1. Define Partition
2. Send Partition
3. Review Partition
4. Cancel Partition Effects5. Define Flow Control
6. Send Flow Control
7. Review Flow Control8. Terminate Spread Daemons {all, none, Proc, CR}
9. Exit
Monitor> 0=============
Activate Status
-------------Enter Proc Name: bp23
Enter Proc Name:
Monitor: send status queryMonitor>
============================
Status at bp23 V3.12 (state 1, gstate 1) after 488025 seconds :
Membership : 7 procs in 1 segments, leader is bp16
rounds : 352222065 tok_hurry : 127 memb change: 10
sent pack: 1323028 recv pack : 6816735 retrans : 165393
u retrans: 0 s retrans : 165393 b retrans : 0
My_aru : 4970943 Aru : 4970942 Highest seq: 4970943
Sessions : 39 Groups : 1 Window : 60
Deliver M: 8084658 Deliver Pk: 8139763 Pers Window: 15
Delta Mes: 8084658 Delta Pack: 4970942 Delta sec : 488025
==================================Monitor> q
Bye.
[root@bp23 ~]#
There's lots of interesting info here, unfortunately it's not terribly
well documented. :)