Difference between revisions of "FTP"

From Archiveteam
Jump to navigation Jump to search
m
(25 intermediate revisions by 15 users not shown)
Line 3: Line 3:
| image = Threeplaces.jpg
| image = Threeplaces.jpg
| description =  
| description =  
| project_status = {{online}}
| project_status = {{specialcase}}
| archiving_status = {{notsaved}}
| archiving_status = {{inprogress}}
| source = https://github.com/ArchiveTeam/ftp-nab
| source = https://github.com/ArchiveTeam/ftp-nab
| tracker = https://tracker.archiveteam.org/ftp/
| irc = effteepee
| irc = effteepee
| irc_network = hackint
}}
}}


Archiving a whole public '''FTP''' host/mirror is easy:
The '''File Transfer Protocol''', '''FTP''', is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.
SketchCow> I use wget -r -l 0 -np -nc ftp://ftp.underscorporn.com
tar cvf 2014.01.ftp.underscorporn.com.tar ftp.underscorporn.com
tar tvf 2014.01.ftp.underscorporn.com.tar > 2014.01.ftp.underscorporn.com.tar.txt


OR, use this handydandy collection of commands with imput.
The FTP grab started 30 November 2015.
#!/bin/bash
echo "Enter the FTP server you want to grab."
read FTP
wget -r l 0 -np -nc ftp://"$FTP"
        tar cvf $(date +%Y).$(date +%m)."$FTP".tar "$FTP"
        tar tvf $(date +%Y).$(date +%m)."$FTP".tar > $(date +%Y).$(date +%m)."$FTP".tar.txt


== How can I help? ==
=== Running the script manually ===
If you use Linux and you're a bit familiar with it, you can try running the script directly.


Now zip/tar it up and [[Internet_Archive#Uploading_to_archive.org|send to the spacious Internet Archive]]![https://archive.org/details/ftpsites] (If you're short on space: <code>tar --remove-files</code> deletes the files shortly after adding them to the tar, not waiting for it to be complete, unlike <code>zip -rm</code>.)
The instructions can be found at https://github.com/ArchiveTeam/ftp-grab.


== The Project ==
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
! Some additional information
|-
| Don't forget to replace YOURNICKHERE with your nickname.
 
The number after <code>--concurrent</code> determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.
 
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named '''STOP''' in the folder of the script (terminal command: <code>touch STOP</code>). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.


* We're currently [https://github.com/ArchiveTeam/ftp-nab listing all FTP sites on the internet] to download them all.
If you see "Project code is out of date", kill the script, go to its folder (<code>cd ftp-grab</code>) and issue <code><nowiki>git pull https://github.com/ArchiveTeam/</nowiki>ftp-grab</code>. After the updating has finished, re-launch the script.
* We're auding a list of some select FTP sites manually: https://www.piratepad.ca/p/old-ftp-list
|}


{| class="wikitable"
=== Discovery items ===
|+Who is grabbing what?
The project needs to have items to be able to run. You can help discovering these items.
|-
 
|Midas
Scripts for creating items for the grab can be found at https://github.com/ArchiveTeam/ftp-queue. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at [[FTP/List]].
|ftp.tu-chemnitz.de
 
|-
{| class="mw-collapsible mw-collapsed" style="text-align:left;"
|Midas
! Some additional information
|ftp.uni-muenster.de
|-
|Midas
|gatekeeper.dec.com
|-
|Midas
|ftp.uni-erlangen.de
|-
|Midas
|ftp.warwick.ac.uk
|-
|-
| [[User:Squidboy]] says:
It's worth noting that as of June 2019 <code>ftp-queue</code> has several [https://github.com/archiveteam/ftp-queue/issues issues] that may make it hard to use.
|}
|}
Uni FTP's are massive, currently only grabbing DEC and Sweex.
 
=== Donating to the Internet Archive ===
Content downloaded by the ArchiveTeam will be uploaded to the [[Internet Archive]], where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. https://archive.org/donate/


== External Links ==
== External Links ==
 
* [https://www.ftp-sites.org Anonymous FTP Sites List]
* [http://www.ftp-sites.org Anonymous FTP Sites List]
* [https://twitter.com/textfiles/status/423243512256028672 @textfiles Talked it over with a few people. We decided to download all the FTP sites. All. of. Them. Smile for your photograph, FTP.]
* [https://twitter.com/textfiles/status/423243512256028672 @textfiles Talked it over with a few people. We decided to download all the FTP sites. All. of. Them. Smile for your photograph, FTP.]
* [https://www.ghacks.net/2019/08/16/google-chrome-82-wont-support-ftp-anymore/ Google uses its browser market dominance to speed up the demise of FTP]


{{navigation_box}}
{{navigation_box}}
[[Category:Web applications]]
[[Category:Web applications]]

Revision as of 07:17, 22 October 2020

FTP
Threeplaces.jpg
Status Special case
Archiving status In progress...
Archiving type Unknown
Project source https://github.com/ArchiveTeam/ftp-nab
Project tracker https://tracker.archiveteam.org/ftp/
IRC channel #effteepee (on hackint)

The File Transfer Protocol, FTP, is a protocol for file transfer published as RFC 114 on 16 April 1971. In the older days of the internet the protocol was frequently used to upload and share files. Today the protocol is not used so much anymore. This made Archive Team decide to grab all the FTP servers.

The FTP grab started 30 November 2015.

How can I help?

Running the script manually

If you use Linux and you're a bit familiar with it, you can try running the script directly.

The instructions can be found at https://github.com/ArchiveTeam/ftp-grab.

Some additional information
Don't forget to replace YOURNICKHERE with your nickname.

The number after --concurrent determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.

If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: touch STOP). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.

If you see "Project code is out of date", kill the script, go to its folder (cd ftp-grab) and issue git pull https://github.com/ArchiveTeam/ftp-grab. After the updating has finished, re-launch the script.

Discovery items

The project needs to have items to be able to run. You can help discovering these items.

Scripts for creating items for the grab can be found at https://github.com/ArchiveTeam/ftp-queue. Instructions on how to run the grab can be found in the README. A list of FTPs that need to be scanned can be found at FTP/List.

Some additional information
User:Squidboy says:

It's worth noting that as of June 2019 ftp-queue has several issues that may make it hard to use.

Donating to the Internet Archive

Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. https://archive.org/donate/

External Links