Difference between revisions of "URLs"

From Archiveteam
Jump to navigation Jump to search
(Add URL lists CTA)
m (Minor fixups)
Line 4: Line 4:
| project_status = {{specialcase}}
| project_status = {{specialcase}}
| archiving_status = {{inprogress}}
| archiving_status = {{inprogress}}
| archiving_type = DPoS
| source = [https://github.com/ArchiveTeam/urls-grab urls-grab]<br />[https://github.com/ArchiveTeam/urls-sources urls-sources]
| source = [https://github.com/ArchiveTeam/urls-grab urls-grab]<br />[https://github.com/ArchiveTeam/urls-sources urls-sources]
| tracker = [https://tracker.archiveteam.org/urls/ urls]
| tracker = [https://tracker.archiveteam.org/urls/ urls]
| irc = //
| irc = //
| data = {{IA collection|archiveteam_urls}}
| data = {{IA id|archiveteam_urls}}
| lead = [[User:Arkiver|arkiver]]
}}
}}


The '''URLs project''' is a continuous generic project to archive random URLs from various sources, e.g. external links discovered in other projects or in older archives. Current projects as of early 2021 that send outlinks to URLs include the [[Reddit]] and [[Yahoo! Answers]] projects. [[User:TheTechRobo]]'s Discord dumps usually have extracted URLs and attachments get sent there too, and lists of URLs can be submitted.
The '''URLs project''' is a continuous generic project to archive random URLs from various sources, e.g. external links discovered in other projects or in older archives. Some current projects as of mid 2023 that send outlinks to URLs include the [[Reddit]] and [[Telegram]] projects. [[User:TheTechRobo|TheTechRobo]]'s Discord dumps usually have extracted URLs and attachments get sent there too, and lists of URLs can be submitted manually.


Important note: If you run this project, you'll likely see your IP get banned from Facebook, Instagram, YouTube, etc., and using those sites may become difficult (e.g. constant captchas, forced login). Also, if you run at significant speed, you'll likely see abuse notices, IP blacklists, and so on.
'''Important note''': If you run this project, you'll likely see your IP get banned from Facebook, Instagram, YouTube, etc., and using those sites may become difficult (e.g. constant captchas, forced login). Also, if you run at significant speed, you'll likely see abuse notices, IP blacklists, and so on.


{{CTA URL lists}}
{{CTA URL lists}}

Revision as of 00:50, 4 August 2023

URLs
URL https://url.spec.whatwg.org/
Status Special case
Archiving status In progress...
Archiving type DPoS
Project source urls-grab
urls-sources
Project tracker urls
IRC channel #// (on hackint)
Project lead arkiver
Data[how to use] archiveteam_urls

The URLs project is a continuous generic project to archive random URLs from various sources, e.g. external links discovered in other projects or in older archives. Some current projects as of mid 2023 that send outlinks to URLs include the Reddit and Telegram projects. TheTechRobo's Discord dumps usually have extracted URLs and attachments get sent there too, and lists of URLs can be submitted manually.

Important note: If you run this project, you'll likely see your IP get banned from Facebook, Instagram, YouTube, etc., and using those sites may become difficult (e.g. constant captchas, forced login). Also, if you run at significant speed, you'll likely see abuse notices, IP blacklists, and so on.

How to help if you have lists of URLs

For other ArchiveTeam projects that can use this kind of help, see Projects requiring URL lists.

This project requires lists of URLs for content on the target website. If you have a source of URLs, please:

  1. If the list exceeds a few megabytes, compress it, preferably using zstd -10.
  2. Give the file a descriptive name and upload it to https://transfer.archivete.am/.
  3. Share the resulting URL in the project IRC channel.
    • If you wish your list to remain private, please get in touch with a channel op (e.g. arkiver or JustAnotherArchivist). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.