reddit home page as seen on December 14, 2019 | |
URL | https://www.reddit.com/ https://old.reddit.com/ |
Status | Endangered |
Archiving status | In progress... (new content as it's posted since January 2021) In progress... (everything else) |
Archiving type | DPoS |
Project source | reddit-grab |
Project tracker | |
IRC channel | #shreddit (on hackint) |
Data[how to use] | archiveteam_reddit |
Reddit is a content aggregator and social bookmarking service similar to the likes of Digg. Users can submit links, text posts, images and videos, vote and comment on submissions in communities called "subreddits". It received considerable attention from its twelve-hour SOPA blackout early in January 2012.
Reddit "quarantines" some controversial subreddits. Many of such quarantine subreddits have been deleted, and to date no quarantined subreddit has ever emerged unscathed, so it is important to make backups of them. Here is a list of quarantined reddits.
It contains some subredits devoted to similar goals as ArchiveTeam, including /r/AbandonedWebsites, /r/ForgottenWebsites, & /r/DataHoarder, which are worth checking for material to be added to ArchiveBot or otherwise benefit from the attention of the team.
In addition to the old.reddit.com subdomain, old Reddit can also be accessed from www.reddit.com by setting the cookie redesign_optout
to true
.
If you are seeing RSYNC errors: If the error is about max connections (either -1 or 400), then this is normal. This is our (not amazingly intuitive) method of telling clients to try another target server (we have many of them). Just let it retry, it'll work eventually. If the error is not about max connections, please contact us on IRC (see the infobox).
If you are seeing HOSTERRs, check that your system or network doesn't block or intercept traffic to other DNS servers from the machine running the warrior or container. We require Quad9 for our containers. Some firewalls or networks might not allow that traffic.
If you need support or wish to discuss, contact us on IRC (see the infobox).
Vital signs
Appears stable, though a small to medium size team is a concern.- 2015-10-06: The admins carried out bannings of several subreddits claiming they were harassing people, the most notable of which was /r/fatpeoplehate. This has instilled some fear, uncertainty, and doubt in some part of the userbase, with a few claiming that reddit will soon become what Digg is now: nearly dead.
Extremely endangered- many subreddits were picketing after the firing of a reddit employee named Victoria by turning themselves private or restricting submissions.- 'Caution' - Reddit seems to have calmed down and returned to normal functionality after Ellen Pao's firing, and the Reddit team is making serious reforms (reducing shadowbanning, more mod tools). However, the revolt left unresolved issues and sour grapes within the community, and it seems Reddit was only saved by the lack of a practical alternative (Voat.co was crushed and went offline due to floods of refugees). It would be wise to preemptively archive the site before another crisis occurs.
- On July 3rd, 2015, Jason Baumgartner completed his 14-month effort to archive Reddit's entire publicly available textual content, just in time before the onset of the Reddit revolt. The archive is still updated monthly. The files are available here. However, images and videos hosted by Reddit are not archived.
- In 2017-2018, Reddit has carried out bannings of several subreddits including r/incels and r/maleforeveralone, which had tens of thousands of subscribers each. Other subreddits including r/Braincels, r/foreveralone, r/TheRedPill and r/MGTOW are endangered. Discussions and petitions about banning those subreddits are currently taking place.[1][2]
- In 2018, a new, redesigned website became the default version of Reddit. This redesigned version has numerous usability issues. It heavily relies on JS and is essentially uncrawlable without dedicated code. The pre-redesign version of Reddit continues to be available at old.reddit.com.
- In March 2019, /r/watchpeopledie, /r/Gore, and some other subs were banned after the Christchurch shooting – this was clearly not due to the video recording of that shooting getting shared (that was forbidden on WPD at least) but due to the negative press coverage, just like for previous bans.
- Also in March 2019, /r/Piracy got threatened by Reddit's legal team with a ban due to the mods allegedly doing too little against copyright infringement.[1]
- Reddit has quarantined manosphere subreddits including /r/Braincels, /r/TheRedPill. /r/Braincels was banned on October 30, 2019.
- Users began to spot in December 2019 that comment threads, at least on the "new" version of the site, were being locked behind a registration wall in an apparent A/B test.[2][3]
- On 29 June 2020, Reddit banned about 2,000 subreddits, most of them inactive.[4] The ten most popular subreddits (by daily users) were r/DarkHumorAndMemes, r/ChapoTrapHouse (Wikipedia), r/ConsumeProduct, r/DarkJokeCentral, r/GenderCritical, r/Cumtown, r/Wojak, r/The_Donald (Wikipedia), r/imgoingtohellforthis2, and r/TheNewRight.[5]
- Subreddits may be banned merely for lack of moderation, as has happened in May 2021 to some subreddits such as /r/SlivkiShow (popular russian YouTuber), /r/AndroidQuestion (without "s" at the end), and /r/FlashGaming. Posts are no longer visible; instead the error "This subreddit was banned due to being unmoderated." appears.
- In 2021 reddit communities went private after reddit hired a controversial person[6]
- On April 18, 2023, Reddit announced that they would be adding new restrictions to their API terms, begin enforcing rate limits, and introduce a paid API tier starting June 19, 2023.[7][8]
- On May 1, 2023, Reddit banned Pushshift; "ingest ceased May 1st around 17:02 GMT" [9] Archives of previous data were made inaccessible on May 8, 2023.
- On 19 May, Pushshift began returning various errors when accessing its public API. Later, the API page was replaced by a notice: "Check back in the next few weeks for updates."[10] A torrent containing submissions and comments from June 2005 to December 2022 is available on Academic Torrents.[11][12][13] Data is also available for January 2023,[14] February 2023,[15] and March 2023.[16]
- There is currently a blackout protest that started on June 12th [17] where some subreddits will stay private for 48 hours, while others will close permanently
- June 15/16 2023: Reddit changed the rules regarding mods pertaking in the blackout. If all mods strike the sub can be taken over by new mods. if one objects that one can takeover. [18][19]
- June 15/16 2023: Some Subs decided to burninate their content by setting it to "removed" which means hidden from regular users, mods can still unremove it. [20]
- June 20 2023: Reddit began banning protesting moderators[21] and possibly mass-deleting content[22]
Textual Archive (Without Images or Videos)
On July 3rd, 2015, Jason Baumgartner completed his 14-month effort to archive Reddit's entire publicly available textual content, just in time before the onset of the Reddit revolt. The archive is still being updated monthly. The files are available here.
- Does not include images and videos hosted by Reddit
- Reddit JSON API output. Posts are archived incrementally in real-time.
- Some comments not accessible due to private subreddits or comment deletion or other API issues
- Reddit /r/datasets - I have every publicly available Reddit comment for research. ~ 1.7 billion comments @ 250 GB compressed. Any interest in this?
- Google BigQuery Analysis of Reddit
The scripts used to generate this API dump were not made public, but it likely used PRAW, and it would probably be better to rewrite from scratch.
Also, this only preserves textual submissions and comments. All images and videos hosted on Reddit are not archived. All sidebar, wiki, and live thread data are not retrieved, so these should be scraped in an expansion pack.
API
Jason Baumgartner also provides an API for accessing Reddit's textual archive available here. The archive is updated in real-time. This API does not have the limitations of Reddit's API. For example, it does not impose limits on the number of submissions or comments that are retrieved.
To search for submissions of a subreddit (500 limit):
https://api.pushshift.io/reddit/search/submission/?subreddit=Archiveteam&size=500
To retrieve all comments for a submission (with tens of thousands of comments):
https://api.pushshift.io/reddit/submission/comment_ids/6uey5x
Note that posts are archived in real-time after they are created. Newer versions of edited posts are not archived. One may have to re-fetch the content on Reddit's site to get the latest revision of an edited post.
Also, one may also have to fetch the images and videos as they are not archived by the API.
Data liberation
As of March 26, 2013, users can only see up to 1,000 posts and comments on a profile page. However, it was stated by admin "spladug" that older comments and posts are still in the database. spladug also states that the team is in favor for retrieving dumps of a user's data, but that the task would be taxing on the servers. Since this comment was posted, there appears to have been no progress on a dump system. Archiving would be nearly impossible using the old-fashioned way (without wget) if things do wind up FUBAR in the future because of this limitation.
Instead, any archival methods should scrape from the Reddit API (which would have to run over several months). The API provides all nested comments that are not noticed by HTML. In addition, it significantly reduces server load.
Because of EU GDPR, progress was forcibly made to be compliant and the site now has a request form. Users can specify that they want a copy of all of their data, or data from specific date ranges. The site says requests may take up to 30 days to be processed.
Project details
As of January 2021, outlinks found through the ArchiveTeam Reddit archiving project are passed to the generic URLs project.
In late April 2024, the ArchiveTeam Reddit archives were made undownloadable on the IA, due to people using them to train AIs.
/r/place
/r/place is a collaborative art event. Any Reddit user can change the colour of a pixel on the canvas with a cooldown of several minutes.
2017
The event was first hosted on April Fools' Day in 2017.
Available archives of this event include:
- Official data release after the event concluded:
- https://web.archive.org/web/20170403034523/https://abra.me/place-snaps/
- PLACE-SNAPSHOTS
- https://web.archive.org/web/20170403172246/http://spacescience.tech/place/
2022
On April 1st, 2022, for four days, Reddit re-hosted the r/place event.
Available archives of this occurrence include:
- Official data release:
- https://old.reddit.com/r/place/comments/txvk2d/rplace_datasets_april_fools_2022/[IA•Wcite•.today•MemWeb]
- This data was replaced a day after the initial release to fix some coordinates related to a moderation tool.
- http://archive.fart.website/archivebot/viewer/domain/placedata.reddit.com
- official-rplace-dataset-2017-2022
- User:TheTechRobo archived a major portion of it. The raw data can be found at thetechrobo-rplace-archive.
- place2022-opl-raw
Gallery
Lists
Potentially endangered subreddits
- https://old.reddit.com/r/WatchRedditDie/ | Anti-Reddit
- https://old.reddit.com/r/opendirectories/ | Piracy
https://old.reddit.com/r/DeadorVegetable/ | Gore and deathBanned in March 2021- https://old.reddit.com/r/FiftyFifty/ | Gore and death
- https://old.reddit.com/r/Piracy/ | Piracy
https://old.reddit.com/r/MakeMyCoffin/ | Gore and deathBanned in April 2022
References
- ↑ https://old.reddit.com/r/Piracy/comments/b28d9q/rpiracy_has_received_a_notice_of_multiple/[IA•Wcite•.today•MemWeb]
- ↑ https://news.ycombinator.com/item?id=21780092[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/mobileweb/comments/e7yivg/join_reddit_to_keep_reading_an_account_is_now/?sort=top[IA•Wcite•.today•MemWeb]
- ↑ Update to Our Content Policy[IA•Wcite•.today•MemWeb] (posted by spez to r/announcements on 29 June 2020)
- ↑ https://www.redditstatic.com/banned-subreddits-june-2020.txt[IA•Wcite•.today•MemWeb]
- ↑ https://www.reddit.com/user/Blank-Cheque/comments/mbmthf/why_is_this_subreddit_private_see_here_for_answers/
- ↑ https://old.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/apolloapp/comments/12ram0f/had_a_few_calls_with_reddit_today_about_the/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/pushshift/comments/13508r9/pushshift_no_longer_has_access_to_the_reddit_api/[IA•Wcite•.today•MemWeb]
- ↑ API has been taken down[IA•Wcite•.today•MemWeb] (posted by signalhunter to r/pushshift on 20 May 2023)
- ↑ Reddit comments/submissions 2005-06 to 2022-12[IA•Wcite•.today•MemWeb]
- ↑ Separate dump files for the top 20k subreddits[IA•Wcite•.today•MemWeb]
- ↑ Subreddit comments/submissions 2005-06 to 2022-12[IA•Wcite•.today•MemWeb]: "This is the top 20,000 subreddits from reddit's history in separate files. You can use your torrent client to only download the subreddit's you're interested in."
- ↑ Reddit comments/submissions 2023-01[IA•Wcite•.today•MemWeb]
- ↑ Reddit comments/submissions 2023-02[IA•Wcite•.today•MemWeb]
- ↑ Pushshift Reddit 2023-03
- ↑ https://www.reddit.com/r/Save3rdPartyApps/comments/13yh0jf/dont_let_reddit_kill_3rd_party_apps/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/ModCoord/comments/14ahqjo/mods_will_be_removed_one_way_or_another_spez/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/ModCoord/comments/14aeq5j/new_admin_post_if_a_moderator_team_unanimously/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/ModCoord/comments/14ahqjo/mods_will_be_removed_one_way_or_another_spez/joayub6/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/ModCoord/comments/14eq8ip/the_entire_rmildlyinteresting_mod_team_has_just/[IA•Wcite•.today•MemWeb]
- ↑ https://old.reddit.com/r/ModCoord/comments/14eppe8/uhhhh_what_the_fuck_is_happening_at/jow6dhu/[IA•Wcite•.today•MemWeb]