Friday 25 July 2008

extracting URLs

So recently i had the unenviable task of getting a load of files from a site, not in the mood to do this by hand i thought a simple scripted way would exist...and after a bit of faffing about and someone giving me an idea i ended up with a bloody simple solution!

cat htmlpage.html |grep -o 'http://[^"]*' > urlsinthisfile.txt

ive added spurious fileextensions so windows users can follow along.

its elegant and it works...!