Yahoo! Answers

From Archiveteam
Revision as of 07:55, 23 May 2017 by Wickedplayer494 (talk | contribs) (Fix typo)
Jump to navigation Jump to search
Yahoo! Answers
Yahoo! Answers logo
YahooAnswersHomepage.png
URL http://answers.yahoo.com
Status Endangered
Archiving status In progress...
Archiving type Unknown
Project source yahooanswers-grab
Project tracker yahooanswers
IRC channel #noanswers (on hackint)

Yahoo! Answers is currently the biggest Q&A site on the internet. It is unlikely to go down soon, but given Yahoo!'s record of deleting sites (Yahoo! Messages, Ask Yahoo!), one would be better off using Answers.com or another Q&A site.

In July 2016, Yahoo! has been acquired by Verizon. Rumour has it that Answers might be closed as a consequence. So ArchiveTeam decided to download it.

Discovery

[1] and [2] are allegedly the oldest accounts on the site.

user:PurpleSymphony did a discovery in July–August 2016, on which the ArchiveTeam project is based.

How can I help?

Keep your number of concurrent threads low. Use 4 threads maximum. 2 threads if you want to be safe. Yahoo applies temporary bans lasting 1 hour if you go too fast. (Appears as response code 500 on the pipeline, 999 in browser.) The bans do not happen immediately. Yahoo monitors the number of requests over time, so keep an eye on your warrior, terminate the pipeline and reduce your concurrency if you get banned.

Running a Warrior

You can start up a Warrior and there select Yahoo! Answers. (If you don't really care what you are archiving, select ArchiveTeam's Choice instead, as at some points ArchiveTeam may prioritize another project.)

Running the script manually

If you use Linux and you're a bit familiar with it, you can try running the script directly.

The instructions can be found at github.com/ArchiveTeam/yahooanswers-grab.

Some additional information
Don't forget to replace YOURNICKHERE with your nickname.

The number after --concurrent determines how many threads run at the same time. You can increase this number if your resources (RAM, CPU, bandwidth) are sufficient. However, if you constantly see messages about rate limiting, there is no need to increase the concurrency.

If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command: touch STOP). The script finishes the current item(s) and stops only after that. (If you kill the script immediately, the items get broken, and they will need to be reassigned to another user.) – Before starting the script again, don't forget to remove the STOP file.

If you see "Project code is out of date", kill the script, go to its folder (cd yahooanswers-grab) and issue git pull https://github.com/ArchiveTeam/yahooanswers-grab. After the updating has finished, re-launch the script.

Donating to the Internet Archive

Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate

Do you like our cause?

If you want to help in other projects, want to learn more about ArchiveTeam, or even help in development in general, navigate to the Main Page of this wiki, from there you can reach a lot of information. The Team consists of volunteers working on the projects in their free time, so helping hands (and resources) are always welcome.