Friday, 25 July 2008

more scripting goodness

Following on from the post below about extracting files I thought I should share the actual mirroring process too.

Once the urls for the files had been extracted into its own file it was simply a case of

wget -prl0 -i fileofurls

for the project i was working on i needed to repeat the extraction of urls from those files and redo the wget a few times, but in the end it was worth it as I finally had all the files I needed (along with 4000+ other files i didnt want all in seperate directories.

So how do you find all the files you need from multiple directories all with different names and move them to a whole new folder?

well it turns out its fairly simple.
for file in `find . -name "*.pdf" -size +50`; do mv $file ../bar;done

this got me all the files i needed (all the pdfs) i used the size option to make sure i wasnt getting files that just ended in .pdf (which this site had).