HTML Hacking Scripts

Here are a few useful web-related programs I've written lately. You might also see this interesting program by Abigail.

Shaking Up the Web

latro
Latro finds idiotic PC sites open to perl.exe?FMH.pl abuse and reports their little problem.

HTML Munging

churl
Extract URLs and verify validity; currently only looks for FTP:, HTTP:, and FILE: schemata, stored in A or IMG tags.

striphtml
Strip out all the html bits from a document, leaving (unformatted) plain text in its wake.

htdecom
Strips out comments from an HTML document.

htitle
Retrieve the title from a URL.

URL Munging

surl
Given a list of URLs, sorts them by last-modified date.

xurl
Given one URL, extract all URLs it contains. Uses the LWP library, and is pretty complete.

qxurl
Somewhat like xurl, (means ``quick xurl'') but expects to read from files, not URLs, and doesn't canonicalize relative links. It also runs about 100x faster and doesn't require an external library.

reltree Fix up a tree's URL to make them all relative instead of absolute.

Netscape Munging

ggh
Grovel global history. Search or dump out the netscape global history history file.