Difference between revisions of "URLs"

From Archiveteam
Jump to navigation Jump to search
(Add urls-sources)
(remove CTA for now)
Line 17: Line 17:


If a website is particularly valuable and should be occasionally requeued (e.g. news sites), it can be added to the [https://github.com/ArchiveTeam/urls-sources/ urls-sources] repository (documentation is in the README).
If a website is particularly valuable and should be occasionally requeued (e.g. news sites), it can be added to the [https://github.com/ArchiveTeam/urls-sources/ urls-sources] repository (documentation is in the README).
{{CTA URL lists}}

Revision as of 20:31, 2 January 2024

URLs
URL https://url.spec.whatwg.org/
Status Special case
Archiving status In progress...
Archiving type DPoS
Project source urls-grab
urls-sources
Project tracker urls
IRC channel #// (on hackint)
Project lead arkiver
Data[how to use] archiveteam_urls

The URLs project is a continuous generic project to archive random URLs from various sources, e.g. external links discovered in other projects or in older archives. Some current projects as of mid 2023 that send outlinks to URLs include the Reddit and Telegram projects. URLs can also be queued manually in the project channel. However, please keep in mind that this project runs at a very high speed and you may accidentally DDoS someone if you queue a lot of URLs on one host. If the URL list is comprised mostly or entirely of one website, it might be a better idea to submit it to ArchiveBot (or make a new Warrior project designed specifically for it). Also note that we have no way of keeping track if any of the URLs actually succeed; if archival fails, they are tried many times, but this is more of a "throw whatever spare URLs you have in" rather than a structured method of archiving a website.

Important note: If you run this project, you'll likely see your IP get banned from Facebook, Instagram, YouTube, etc., and using those sites may become difficult (e.g. constant captchas, forced login). Also, if you run at significant speed, you'll likely see abuse notices, IP blacklists, and so on.

If a website is particularly valuable and should be occasionally requeued (e.g. news sites), it can be added to the urls-sources repository (documentation is in the README).