Steve Kemp's Blog

Debian & Free Software

About This Site

This is a simple blog relating to Debian & Free Software issues.

Archive

Entries tagged "tools".

25th January 2007

I would like to have a simple way of mirroring a webpage, including any referenced .css, .js, and images.

owever to complicated matters I wish to mandate that the file will be saved as “index.html” – regardless of what it was originally called.

This appears to rule wget out, as the -output=index.html option trumps the -page-requisites flag (which is used to download images, etc which are referenced.)

Is there a simple tool which will download a single webpage, save it to a user-defined local filename and also download referenced images/css files/javascript files? (Rewriting the file to make them work too)

Using Perl I could pull down the page, and I guess I could parse the HTML manually – but that seems non-trivial – but I’d imagine there is a tool out there to do the job.

So far I’ve looked at curl, httrack, and wget.

If I’m missing the obvious solution please point me at it ..

(Yes, this is so that I can take “snapshots” of links added to my bookmark server.)

Tags: tools, wget.
21st June 2007

The source-searching system I was talking about previously is progressing slowly.

So far I've synced the source to Etch to my local machine, total size 29Gb, and this evening I've started unpacking all the source.

I'm still in the "a" section at the moment, but thanks to caching I should be able to re-sync the source archive and unpack newer revisions pretty speedily.

The big problem at the moment is that the unpacking of all the archives is incredibly slow. Still I do have one new bug to report aatv: Buffer overflow in handling environmental variables..

That was found with:

rgrep getenv /mnt/mirror/unpacked | grep sprintf

(A very very very slow pair of greps. Hopefully once the unpacking has finished it will become faster. ha!)

The only issue I see at the moment is that I might not have the disk space to store an unpacked tree. I've got 100Gb allocated, with 29Gb comprised of the source. I'll just have to hope that the source is less than 70Gb unpacked or do this in stages.)

I've been working on a list of patterns and processes to run, I think pscan, rats, and its should be the first tools to run on the archive. Then after that some directed use of grep.

If anybody else with more disk space and connectivity than myself be interested I can post the script(s) I'm using to sync and unpack .. Failing that I'll shut up now.

4th December 2007

If you're interested in working upon your CV/Resume, as Otavio Salvador was recently, then I'd highly recommend the xml-resume-library.

It allows you to write your address, previous jobs, and skills as XML then generate PDF, HTML, and plain text format documents via a simple Makefile.

It won't help with clueless agencies that mandate the use of Microsoft Word Documents for submission, so they can butcher your submission and "earn" their fee(s), but otherwise it rocks.

5th December 2007

After mentioning the xml-resume-library package I was reminded that the English translation has been out of date for over a year.

With permission from the maintainer I've made a new upload which fixes this, and a couple of other bugs.

On a different topic it seems that many Debian-related websites are having their designs tweaked.

I'm not redesigning mine, but I'd love other people to have a go.

Here's hoping.

RSS feed

Tags

Created by Chronicle v3.1