I work with log files a lot.
Most of the logfiles I work with are in a standard format of some kind, and most often they are rotated upon a daily basis. (Examples include syslog, qpsmtpd, and Apache logfiles.)
I wish there were a general purpose way to say "grep time-range pattern logfile".
Right now, for example, I've just deployed some changes upon a cluster of hosts. Now I want to see only messages that refer to a particular area of the codebase only those that occurred after 23:00 - which is when I did the commit/push/pull dance.
I've written a quick hack - tgrep (time-grep) - which allows simple before/equal/after/range grepping :
# show matching lines after 23:00PM tgrep \>23:00:00 -i subject /var/log/qpsmtpd/qpsmtpd.log # show matching lines in the interval 23:00PM 23:15PM tgrep 23:00:00-23:15:00 -i -r subject /var/log/qpsmtpd/
If there is a common way of doing this "properly" then I'd love to be educated, failing that take it if it is useful (moreutils?)
ObFilm: Chasing Amy
Is this helpful? I've not used it, but it looks liek it should support time comparison.
It would be nice if there was a site where you could go look up the current 'best application' or 'best solution'.
Yes I'm familiar with that tool, I even commented on that entry as skx!
It had slipped my mind, and I had mostly file it away as being useful for Apache, rather than more general purpose. It is probably the best thing out there at the moment.
I'm very familiar with Logwatch, but thats not the kind of examination I'm after at the moment - can you imagine getting a mail every day of your entire Apache logfile?
Mostly I'm making adhoc searches for debug messages, or trying to collect statistics before the daily "make graphs", or "make summery" emails get fired off.
Thanks! I'm sure the tool will be useful pretty generally now I have it, and it definitely solved my immediate needs.
May not be as quick as the other tools, but sed is everywhere.
Justin
I think that splunk solves a different problem than that I'm interested in. I've certainly centralised logging in the past via syslog-ng, but generally that isn't useful to me.
Thanks for the reminder about the flip-flop operator. The following perl is almost as good as what I wrote:
(That shows entries between 06:25 and 06:33)
Steve, for doing these kinds of extraction jobs, I've found that it's often a good idea to run a first pass through the data with a very simple awk script that grabs the byte (not the line) offsets of the first *and the last* instances of 15-minute intervals. These indices end up being very small, and they can be easily used with dd to examine only specific time intervals in the logs.
This tactic falls down when logs are gzipped, unfortunately (unless the --rsyncable option is used, or something like dictzip, zsync or whatever). bzip2 is easier to work with because of the independently compressed blocks, but an index is a bit harder to generate because those blocks are packed bitwise instead of bytewise, and I don't know of a simple tool to bit-shift an entire stream. There are a few more pointers over at http://perldition.org/articles/Random%20seeking%20on%20gzip%20streams.sbc , if you're interested.