Imgur

From Archiveteam
Jump to navigation Jump to search
Imgur
Imgur logo
Imgur homepage screenshot.png
URL https://imgur.com
Status Endangered
Archiving status In progress...
Archiving type DPoS
Project source imgur-grab
Project tracker imgur
IRC channel #imgone (on hackint)
Data[how to use] archiveteam_imgur

Imgur is an image sharing community and former image host.

Vital signs

Seems stable for now. Images submitted by free users are deleted after not being accessed for 6 months.

Imgur serves a massive amount of traffic. In 2012 alone, 42 petabytes of data were transferred. Fortunately, the amount of images uploaded is much less, albeit still a lot. In 2012, around 300,000,000 images were uploaded; assuming an average size of 120KB, that's 36TB in one year. As of 2014, there were 650 million images with 1.5 million being added each day according to one source[1]. An analysis in 2015 based on extrapolation from a sample of random image IDs estimated about 2 billion images with a total raw full-resolution image size of 376 TiB[2].

Imgur was originally created as a gift to the Reddit community because of problems experienced with other image hosting services. It was used extensively on Reddit for many years until Reddit introduced their own native image host (at http://i.redd.it/) in 2016, causing a significant decrease in submissions to Imgur.

In 2018, there were reports that Imgur had started to enforce their terms of service, specifically that using Imgur for image hosting is not permitted and may result in deletion of those images[3].

In 2021, Imgur was acquired by MediaLab.[4][5]

In 2023, in a page titled "Imgur Terms of Service Update [April 19, 2023]": "Our new Terms of Service will go into effect on May 15, 2023. We will be focused on removing old, unused, and inactive content that is not tied to a user account from our platform as well as nudity, pornography, & sexually explicit content. You will need to download/save any images that you wish to save if they no longer adhere to these Terms. Most notably, this would include explicit/pornographic content."[6] Like Tumblr, Imgur will remove porn and similar in 2023-05-15; they will also remove "old, unused, and inactive content", which makes Imgur much less of a persistent image host.

On 2024-02-26, Imgur started redirecting direct image links to the corresponding page. They had been doing that based on the referrer before already (for hotlinking protection), but since this date, they are also redirecting when no referrer is sent, e.g. because you open an i.imgur.com URL directly in a new tab. The detection seems to be based on the Accept header; if it contains text/html anywhere, regardless of the weight or any prefixes and suffixes, the server returns a redirect.

Archiving galleries

When archiving a larger gallery, eg. https://imgur.com/gallery/PTFfu, a simple ArchiveBot !ao won't do because of the JavaScript "load more" button. However, you can append /zip to the URL to get all pictures in full like so: https://imgur.com/a/PTFfu/zip (old URL format used to be https://imgur.com/gallery/PTFfu/zip).

Sitemap

A sitemap from May 2017 onwards can be found at https://imgur.com/imgur-assets/sitemap_gallery/gallery_images.xml. This sitemap only covers galleries, i.e. albums shared publicly on the platform.

Further observations

While the Wayback Machine is able to save individual Imgur posts as of 2022, the "trending" section and the home page are not saved properly, as indicated by archives from November and December displaying the same trends on the front page.[7][8]

Using Save Page Now alone is not sufficient to archive an image or album/gallery page completely as of early 2023. The created snapshot must also be loaded in a browser to trigger the retrieval and archival of the images themselves via JavaScript. For larger albums, this can often fail on some or all images due to SPN's rate limits, requiring retries.

How to help if you have lists of URLs

For other ArchiveTeam projects that can use this kind of help, see Projects requiring URL lists.

This project requires lists of URLs for content on the target website. If you have a source of URLs, please:

  1. Use the PCRE regular expression \S*imgur\S* for filtering.
    • Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern, as it may miss valid URLs. We can always filter or transform the results as needed later.
    • Enable case-insensitive matching (e.g. grep's -i) to catch URLs with capitalization.
    • If using grep or similar, enable text matching (-a or --text) to catch URLs in files with apparent binary data.
    • Example command (GNU grep): grep -Pahoi '\S*imgur\S*' FILENAME FILENAME...
  2. If the output exceeds a few megabytes, compress it, preferably using zstd -10.
  3. Give the file a descriptive name and upload it to https://transfer.archivete.am/.
  4. Share the resulting URL in the project IRC channel.
    • If you wish your list to remain private, please get in touch with a channel op (e.g. arkiver or JustAnotherArchivist). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.

References