Skip to content

Entries tagged "mysql".

Dwayne, I think you might be colorblind.

It is unfortunate that most server-packages don't seperate out their init scripts into separate packages:

  • foo
    • Contains the server binary, associated config files, and libraries.
  • foo-run or foo-server
    • Contains the init script(s).

Right now its a real pain to have to modify things like /etc/init.d/ssh to launch two daemons, running on two different ports, with two different configuration files.

Running multiple copies of SMTP daemons, databases, and similar things is basically more complex than it has to be, because our packages aren't setup for it.

If you maintain a daemon please do consider this, failing that honoring a flag such as "DISABLED=true" in /etc/default/foo would allow people to use their own /etc/init.d/foo.local initscript. (That's not perfect, but it is a step in the right direction.)

ObFilm: Little Miss Sunshine.

 

I'm fireproof, you're not

I've mostly been avoiding the computer this evening, but I did spend the last hour working on attempt #2 at distributed monitoring.

The more I plot, plan & ponder the more appealing the notion becomes.

Too many ideas to discuss, but in brief:

My previous idea of running tests every few minutes on each node scaled badly when the number of host+service pairs to be tested grew.

This lead to the realisation that as long as some node tests each host+service pair you're OK. Every node need not check each host on every run - this was something I knew, and had discussed, but I assumed that would be a nice optimisation later rather than something which is almost mandatory.

My previous idea of testing for failures on other nodes after seeing a local failure was similarly flawed. It introduces too many delays:

  • Node 1 starts all tests - notices a failure. Records it
    • Fetches current results from all neighbour nodes.
    • Sees they are OK - the remote server only just crashed. Oops.
  • Node 2 starts all tests - notices a failure. Records it.
    • Fetches current results from all neighbour nodes.
    • Sees they are OK - the remote server only just crashed. Oops.

In short you have a synchronisation problem which coupled with the delay of making a large number of tests soon grows. Given a testing period of five minutes, ten testing nodes, and 150 monitored hosts+services, you're looking at delays of 8-15 minutes. On average. (Largely depends on where in the cycle the failing host is, and how many nodes must see a failure prior to alerting.)

So round two has each node picking tests at "random" (making sure no host+service was tested more than 5 minutes ago) and at the point a failure is detected the neighbour nodes are immediately instructed to test and report their results (via XML::RPC).

The new code is simpler, more reliable, and scales better. Plus it doesn't need Apache/CGI.

Anyway bored now. Hellish day. MySQL blows goats.

ObFilm: Hellboy

 

Hey, Ash, where are we?

I'm currently fighting with MySQL. The following takes too long:

mysql> SELECT COUNT(id) FROM q_archive;
+-----------+
| COUNT(id) |
+-----------+
|   2738048 |
+-----------+
1 row in set (17.95 sec)

I would like it to take significantly less time, even with memcached being in use it gets hit too often. I've added an index to the table - but I didn't expect that to help, and I wasn't disappointed.

Ho hum.

Maybe another case where flat-files are best. Sure counting them would take a while, but once I've counted them I can just `cat sum`.

This is probably a case where tweaking memory of MySQL would help. But I'm fairly certain if I start messing with that I'll get into trouble with other parts of my site.

ObFilm: The Evil Dead

 

You think we just work at a comic book store for our folks, huh?

I'm only a minimal MySQL user, but I've got a problem with a large table full of data and I'm hoping for tips on how to improve it.

Right now I have a table which looks like this:

CREATE TABLE `books` (
  `id` int(11) NOT NULL auto_increment,
  `owner` int(11) NOT NULL,
  `title` varchar(200) NOT NULL,
  ....
  PRIMARY KEY  (`id`),
  KEY( `owner`)
)  ;

This allows me to lookup all the BOOKS a USER has - because the user table has an ID and the books table has an owner attribute.

However I've got hundreds of users, and thousands of books. So I'm thinking I want to be able to find the list of books a user has.

Initially I thought I could use a view:

CREATE VIEW view_steve  AS select * FROM books WHERE owner=73

But that suffers from a problem - the table has discountinuous IDs coming from the books table, and I'd love to be able to work with them in steps of 1. (Also having to create a view for each user is an overhead I could live without. Perhaps some stored procedure magic is what I need?)

Is there a simple way that I can create a view/subtable which would allow me to return something like:

|id|book_id|owner | title      |....|
|0 | 17    | Steve| Pies       | ..|
|1 | 32    | Steve| Fly Fishing| ..|
|2 | 21    | Steve| Smiles     | ..|
|3 | 24    | Steve| Debian     | ..|

Where the "id" is a consecutive, incrementing number, such that "paging" becomes trivial?

ObQuote: The Lost Boys

Update: without going into details the requirement for known, static, and ideally consecutive identifiers is related to doing correct paging.

 

Its a lot like life

Assume for a moment that you have 148 hosts logging, via syslog-ng, to a central host. That host is recording all log entries into an MySQL database. Assume that each of these machines is producing a total of 4698816 lines per day.

(Crazy random numbers pulled from thin air; globviously).

Now the question: How do you process, read, or pay attention to those logs?

Here is what we've done so far:

syslog-ng

All the syslog-ng client machines are logging to a central machine, which inserts the records into a database.

This database may be queried using the php-syslog-ng script. Unfortunately this search is relatively slow, and also the user-interface is appallingly bad. Allowing only searches, not a view of most recent logs, auto-refreshing via AJAX etc.

rss feeds

To remedy the slowness, and poor usability of the PHP front-end to the database I wrote a quick hack which produces RSS feeds via queries, against that same database, accessed via URIs such as:

  • http://example.com/feeds/DriveReady
  • http://example.com/feeds/host/host1

The first query returns and RSS feed of log entries containing the given term. The second shows all recent entries from the machine host1.

That works nicely for a fixed set of patterns, but the problem with this approach, and that of php-syslog-ng in general, is that it will only show you things that you look for - it won't volunteer trends, patterns, or news.

The fundamental problem is a lack of notion in either system of "recent messages worth reading" (on a global or per-machine basis).

To put that into perspective given a logfile from one host containing, say, 3740 lines there are only approximately 814 unique lines if you ignore the date + timestamp.

Reducing logentries by that amount (78% decrease) is a significant saving, but even so you wouldn't want to read 22% of our original 4698816 lines of logs as that is still over a million log-entries.

I guess we could trim the results down further via a pipe through logcheck or similar, but I can't help thinking that still isn't going to give us enough interesting things to view.

To reiterate I would like to see:

  • per-machine anomolies.
  • global anomolies.

To that end I've been working on something, but I'm not too sure yet if it will go anywhere... In brief you take the logfiles and tokenize, then you record the token frequencies as groups within a given host's prior records. Unique pairings == logs you want to see.

(i.e. token frequency analysis on things like "<auth.info> yuling.example.com sshd[28427]: Did not receive identification string from 1.3.3.4"

What do other people do? There must be a huge market for this? Even amongst people who don't have more than 20 machines!