Skip to content

Its a lot like life

Assume for a moment that you have 148 hosts logging, via syslog-ng, to a central host. That host is recording all log entries into an MySQL database. Assume that each of these machines is producing a total of 4698816 lines per day.

(Crazy random numbers pulled from thin air; globviously).

Now the question: How do you process, read, or pay attention to those logs?

Here is what we've done so far:

syslog-ng

All the syslog-ng client machines are logging to a central machine, which inserts the records into a database.

This database may be queried using the php-syslog-ng script. Unfortunately this search is relatively slow, and also the user-interface is appallingly bad. Allowing only searches, not a view of most recent logs, auto-refreshing via AJAX etc.

rss feeds

To remedy the slowness, and poor usability of the PHP front-end to the database I wrote a quick hack which produces RSS feeds via queries, against that same database, accessed via URIs such as:

  • http://example.com/feeds/DriveReady
  • http://example.com/feeds/host/host1

The first query returns and RSS feed of log entries containing the given term. The second shows all recent entries from the machine host1.

That works nicely for a fixed set of patterns, but the problem with this approach, and that of php-syslog-ng in general, is that it will only show you things that you look for - it won't volunteer trends, patterns, or news.

The fundamental problem is a lack of notion in either system of "recent messages worth reading" (on a global or per-machine basis).

To put that into perspective given a logfile from one host containing, say, 3740 lines there are only approximately 814 unique lines if you ignore the date + timestamp.

Reducing logentries by that amount (78% decrease) is a significant saving, but even so you wouldn't want to read 22% of our original 4698816 lines of logs as that is still over a million log-entries.

I guess we could trim the results down further via a pipe through logcheck or similar, but I can't help thinking that still isn't going to give us enough interesting things to view.

To reiterate I would like to see:

  • per-machine anomolies.
  • global anomolies.

To that end I've been working on something, but I'm not too sure yet if it will go anywhere... In brief you take the logfiles and tokenize, then you record the token frequencies as groups within a given host's prior records. Unique pairings == logs you want to see.

(i.e. token frequency analysis on things like "<auth.info> yuling.example.com sshd[28427]: Did not receive identification string from 1.3.3.4"

What do other people do? There must be a huge market for this? Even amongst people who don't have more than 20 machines!

Comments On This Entry

  1. [gravitar] Thom May
    Sounds like you're almost describing splunk - www.splunk.org.
  2. [gravitar] Adrian Bridgett
    I use SEC (Simple Event Correlator) to watch log entries. Whilst I've had to have some ugly perl functions to do complex matching for java logs (would be best offloaded to log4j or one of the specialist java log analysers), most stuff can be done in SEC for free - such as "warn if you get more than 5 of these messages in a minute" or "do this if you see this message and you _don't_ see this other message within 30secs".
    You might also want to have a look at splunk - I've not tried it myself mind: http://www.splunk.com/
  3. [gravitar] Philipp Kern
    Do you know Splunk? (http://www.splunk.com)
  4. [author] Steve Kemp

    Thanks for the comments. I've heard of Splunk, but never used it.

    Ideally I'd not like to pay ..

  5. [gravitar] Warren Guy
    Check out Anton Chuvakin's blog, he's a bit of a logging evangelist: http://chuvakin.blogspot.com/
  6. [gravitar] James
    There is a huge market - see Splunk, Zenoss, Hyperic HQ, OpenNMS just off the top of my head. There's a huge list of free and non-free systems at http://www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html
  7. [gravitar] Vincent Bernat
    There are software like splunk that will match your requirements. But it is not free. Maybe someone will program a clone for it.
  8. [gravitar] Wilfred
    They buy a counterpane contract. :-)
  9. [gravitar] Dale King
    For some time I've considered implementing something like logbayes and NBS for syslog (http://www.ranum.com/security/computer_security/code/index.html)
    on my home network, but like everything time gets in the way.
  10. [gravitar] nico
    Using sommething like splunk ? (http://www.splunk.com)
  11. [gravitar] Scott Lamb
    Have you looked at Splunk?
  12. [gravitar] Sam
    If you're lazy enough to do a lazyweb post, maybe you can be lazy enough to let someone else do the work for you and try splunk. I keep seeing ads for it on slashdot, and I've even done a test install, but since my work environment is woefully light on syslog, I haven't had a chance to set it up.
    It seems like it does what you want, and maybe more.
  13. [gravitar] Carsten Aulbert
    Hi Steve, we are currently going the logcheck way but performing logchecks on each box locally and transferring only then the remaining stuff to a central node - we will have close to 1400 servers soon and I don't think logging everything into a central DB makes much sense on that scale. An RSS feed looks really sexy but of course there needs to be some kind of "tagging" mechanism which allows only to look at "important" stuff - however this is going to be decided upon. If you have a good solution, please post it :)
  14. [gravitar] santi
    I use moodss, but now it's uninstalable on debian due a depend problem.
  15. [gravitar] Anonymous
    While I wouldn't suggest using it as a sole solution, some people use crm114 to do log analysis, classifying log entries by relevance.
  16. [gravitar] Alex
    ... and now you see why it's not a simple problem to solve ;) I'd have happily deployed Splunk for our uses if it didn't cost an absolute fortune in licensing; we'd be looking at between £15,000 and £20,000 for our current volume I believe!
    It's also got the "closed nature" feel about it, sure it's very pretty and seems to work well, but you can't get elbow deep in it and muck about :( If it doesn't suit your environment and you want a major-ish change, sod off.
    Regarding your post about not getting regularly refreshed logs with AJAX, you *can* actually tail our logs quite nicely - go to the "Input lots of criteria" page and instead of clicking the 'Search' button, click the 'Tail' button. ;)
    I'm convinced that inserting logs into MySQL or PostgreSQL works and is the most powerful solution. Anything is better than having >50GB of logfiles and using grep ;) As it stands we're able to filter easily and searching over ~24h of logs isn't too bad with response times <5s - from when we discussed this before I'm sure the issue pertained to both the database schema and user interface :'(
    I'd be up for improving the DB schema, but I think rewriting it in Ruby on Rails would be a bad idea. We don't need 10 more processes on loghost eating 50MB of RAM ;) Perhaps the current code written in PHP would be worth tweaking?
  17. [gravitar] Joćo Carneiro
    i use zabbix www.zabbix.org that does the monitoring part, but i guess not syslog, it uses snmp or agents... It's quite good actually, it even produces great graphics and trends, sinoptic charts et al...
    have fun