Difference between revisions of "FTP"

From Archiveteam
Jump to navigation Jump to search
m
m (Unlink addresses in <code>)
Line 10: Line 10:


Archiving a whole public '''FTP''' host/mirror is easy:
Archiving a whole public '''FTP''' host/mirror is easy:
  SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
  SketchCow> I use wget -r -l 0 -np -nc ftp<nowiki>:/</nowiki>/ftp.underscorporn.com
  tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
  tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
  tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt
  tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt
Line 41: Line 41:
Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using <code>tar --remove-files</code>
Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using <code>tar --remove-files</code>


  lftp ftp://site.com -e 'du -h'
  lftp ftp<nowiki>:/</nowiki>/site.com -e 'du -h'


An alternate to try if the above does not work correctly (happens more often on old servers):
An alternate to try if the above does not work correctly (happens more often on old servers):
  lftp -c 'set ftp:use-feat no; du -h ftp://site'
  lftp -c 'set ftp:use-feat no; du -h ftp<nowiki>:/</nowiki>/site'


Now zip/tar it up and [[Internet_Archive#Uploading_to_archive.org|send to the spacious Internet Archive]]![https://archive.org/details/ftpsites] (If you're short on space: <code>tar --remove-files</code> deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike <code>zip -rm</code>.)
Now zip/tar it up and [[Internet_Archive#Uploading_to_archive.org|send to the spacious Internet Archive]]![https://archive.org/details/ftpsites] (If you're short on space: <code>tar --remove-files</code> deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike <code>zip -rm</code>.)

Revision as of 21:26, 9 February 2016

FTP
Threeplaces.jpg
Status Online!
Archiving status In progress...
Archiving type Unknown
Project source https://github.com/ArchiveTeam/ftp-nab
IRC channel #effteepee (on hackint)

Archiving a whole public FTP host/mirror is easy:

SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt

OR, use this handy dandy function to put in your .bashrc file, you can also remove the first and last line to turn it into a fancy bash script. Made by SN4T14

ftp-grab(){
    target="$1"
    wget -r -l 0 -np -nc "$target"
    if [[ "$target" =~ ^ftp://.*$ ]]
        then
        target="$(echo "$target" | cut -d '/' -f 3)"
        echo "ftp"
        echo "$target"
    fi
    tar cvf $(date +%Y).$(date +%m)."$target".tar "$target"
    tar tvf $(date +%Y).$(date +%m)."$target".tar > $(date +%Y).$(date +%m)."$target".tar.txt
}

Alternatively, you can use lftp:

SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"

Note that this produces tons of debug output (roughly equivalent to the HTTP header info captured by wget-warc for HTTP). Check the logs for personal information (local paths and such). If the server is older and the above does not work correctly you may have to do the following:

SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; set ftp:use-feat no; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"

If the site uses a nonstandard or foreign charset (common with older foreign servers), you will have to do the following (replace CHARSET with the correct charset identifier for the server):

SITE=ftp.somesite.com; lftp -c "debug 10 -o $SITE.debug.log; set ftp:charset "CHARSET"; open $SITE; mirror --verbose=3 --log=$SITE.mirror.log / $SITE"


Check the size of the site before you start to make sure you have the space to hold the site and tar afterwards, also account for large files on the site when using tar --remove-files

lftp ftp://site.com -e 'du -h'

An alternate to try if the above does not work correctly (happens more often on old servers):

lftp -c 'set ftp:use-feat no; du -h ftp://site'

Now zip/tar it up and send to the spacious Internet Archive![1] (If you're short on space: tar --remove-files deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike zip -rm.)

The Project

Who is grabbing what?
Midas ftp.tu-chemnitz.de
Midas ftp.uni-muenster.de
Midas gatekeeper.dec.com
Midas ftp.uni-erlangen.de
Midas ftp.warwick.ac.uk

Uni FTP's are massive, currently only grabbing DEC and Sweex.

External Links