Wow, I'm not an IM'er, never have been; but, I think this borders on brilliance.The thing that sucked me in the most about your idea is the channels, and with the ability to name channels however you'd like, there are potentially infinite channel names (ok, pedantically, it only approaches infinity; I assume there is a character limit for the channel names). I imagine being able to create a channel for, perhaps, different network device types, say, Cisco Routers, and another for, say, Nortel Switches, and yet another for Checkpoint Firewall routers. Changing hats, as a server guy, I could create my own set of server channels keeping track of resources like drive space, memory, cpu usage; and changing hats again, as an application developer/baby-sitter, I can create channels for the interoperation of various applications that all work together, etc. Now some of these abilities are already in syslog, but we're pretty limited in the number of channels we can use, so trying to coordinate between all the groups to agree on the "standards" to keep from "poluting" one anothers syslog files could get pretty ugly. I also like the relatively light weight for the "broadcast" ability of the syslog information. I'm not very familiar with the actual IRC protocol implementations, but way back when, I think I recall that if you wanted to create an IRC 'server', that server just had to ask (and receive permission) to receive the IRC messages; and similarly a client simply had to ask a server to be able to receive the appropriate messages. This seems to be fairly light weight, and things are even better if IRC now can actually use multicasting. ++ many times for this very cool idea. -Scott | [reply] |
On its face, it might seem like a good idea. The problem is that IRC was intentionally designed to accomodate delays in communication. The timestamp in a given log is the timestamp for when the client recieves the message. Lag in the network, on the IRC server, or on the client machine could easily lead to inaccurate timestamp data -- even to the point of causing events to appear in a different order from which they happened (from different processes).
A partial solution would be sender-side timestamps, but then you have authority issues as well (how do you *know* someone doesn't accidently duplicate a login ID for a given application? what about multiple instances?). Most of these are solvable, but rely heavily on the senders to do the right thing.
A solution which I have seen work well is implemented over a database, with a logging daemon running on each local host. It works sort of like this: an application performs IPC (in this case, it was an XML message to the local daemon using a telnet protocol) sending a few pieces of information (pid, status-code{1=warn, 2=err, etc.}, description). The local daemon timestamps it in the order recieved, and creates DB transactions that log the relevant info, from the daemon (including it's timestamp, the host name, etc.).
In this setup, all applications log verbosely (not quite 'trace', but about 'debug' level), and the daemon can be configured to drop or forward messages at various levels. So, we can move to 'debug' on a given *machine* with one instruction to its daemon.
There are some problems with the whole thing, but it has served us well overall.
| [reply] |
Agreed that a simple protocol like IRC has issues with security & integrity. You would have to trust yourself and your colleagues notto be stupid or evil.
With the system you use, do you find that you have scaling problems with the db inserts? I assume that the local daemon will retry if the db becomes unavailable, but what does your app do if the local daemon becomes unavailable?
| [reply] |
We don't tend to have scaling issues with the DB because we have an HA database system. I don't know the details, but there are several "satellite" servers that accept queriers, and together they form a sort of "logical database" that is relplicated, in turn, to a more solid archive. I could be explaining it wrong, as I didn't set it up.
The local daemon dying *is* one of the issues. Until recently, apps dealt with this in undefined ways (by which I mean the authors chose, there was no standard). Just recently, we decided that we'd write app-named files to a specific directory, which the daemon scans and uploads (and cleans) at startup. I have reservations about this, though -- it seems like asking for trouble.
Powers that be know there are issues with this too, but no one (including me) has come up with a better idea yet...
| [reply] |