FTP
FTP | |
Status | Online! |
Archiving status | In progress... |
Archiving type | Unknown |
Project source | https://github.com/ArchiveTeam/ftp-nab |
IRC channel | #effteepee (on hackint) |
Archiving a whole public FTP host/mirror is easy:
SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt
OR, use this handy dandy function to put in your .bashrc file, you can also remove the first and last line to turn it into a fancy bash script. Made by SN4T14
ftp-grab(){ target="$1" wget -r -l 0 -np -nc "$target" if [[ "$target" =~ ^ftp://.*$ ]] then target="$(echo "$target" | cut -d '/' -f 3)" echo "ftp" echo "$target" fi tar cvf $(date +%Y).$(date +%m)."$target".tar "$target" tar tvf $(date +%Y).$(date +%m)."$target".tar > $(date +%Y).$(date +%m)."$target".tar.txt }
Alternatively, you can use lftp:
SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"
Note that this produces tons of debug output (roughly equivalent to the HTTP header info captured by wget-warc for HTTP). Check the logs for personal information (local paths and such). If the server is older and the above does not work correctly you may have to do the following:
SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; set ftp:use-feat no; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"
If the site uses a nonstandard or foreign charset (common with older foreign servers), you will have to do the following (replace CHARSET with the correct charset identifier for the server):
SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; set ftp:charset "CHARSET"; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"
Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using tar --remove-files
lftp ftp://site.com -e 'du -h'
An alternate to try if the above does not work correctly (happens more often on old servers):
lftp -c 'set ftp:use-feat no; du -h ftp://site'
Now zip/tar it up and send to the spacious Internet Archive![1] (If you're short on space: tar --remove-files
deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike zip -rm
.)
The Project
- We're currently listing all FTP sites on the internet to download them all.
- We're auding a list of some select FTP sites manually:
Midas | ftp.tu-chemnitz.de |
Midas | ftp.uni-muenster.de |
Midas | gatekeeper.dec.com |
Midas | ftp.uni-erlangen.de |
Midas | ftp.warwick.ac.uk |
Uni FTP's are massive, currently only grabbing DEC and Sweex.