Archive.today
| Archive.today | |
| URL | https://archive.today/ and others |
| Status | Online! |
| Archiving status | Not saved yet |
| Archiving type | Unknown |
| IRC channel | #archiveteam-bs (on hackint) |
Archive.today is a privately funded on-demand archiving site, similar to WebCite. It gained traction as an alternative to the Wayback Machine, particularily webpages whose Javascript would fail to replay, or domains that have been or may become excluded and censored from the Wayback Machine.
In some popular news sites and magazines, archive.today is able to save a copy of the article despite paywalls (preservation that is arguably piracy).[1] There seems to be an ethos of archiving content even when that conflicts with replaying web pages completely authentically, different from the Wayback Machine's "view from nowhere". This is also underscored by the fact archive.today appears to avoid saving ads that come with pages,[2] and also gets to the content behind annoyances or login-walls (using dedicated accounts) in popular social media like Twitter,[3] Github,[4] and Reddit.[5]
Search engines are able to index Archive.today.
The website shot up significantly in popularity in the second half of 2014 primarily due to the GamerGate controversy [citation needed]. As of 2021-02-12, there is 700TB of content.
Archive.is, archive.ph, archive.md and a few other alternative domains are aliases used by archive.today with the purpose of circumventing blocks by some ISPs.[6][7]
Vital Signs
Note that the site is a commercial enterprise, and as such can go kaputt at any given point, especially if it does not find a lucrative business model. Although it's not a strong indication of long-term issues; in October 2016 the site "made transparent"[IA•Wcite•.today] the server costs[IA•Wcite•.today], and started to accept donations. A weekly crowdfunded target of $800[8] is set to maintain the site.
Prior to this, the site actively refused donations. A donation link took the user to an animal shelter donation page[9].
In January 2017 the administrator commented in response to a censorship query that the site had "just run out of CPU for the browsers"[IA•Wcite•.today]. - With problems capturing pages, it is unclear if this is a temporary issue.
In February 2026, Wikipedia blacklisted the service from being used in reference links, after the site was used to DDoS a blog, and altered past snapshots of webpages.[10]
Funding
According to their FAQ[IA•Wcite•.today]:
It is privately funded, there in no complex finance behind it. It may look more or less reliable compared to the startup-style funding or a university project, depending on which risks are taken into account.
My death can cause interruption of service, but something like new market condition or changing head of a department can not.
As of October 2016 the site has a 'liberapay'[11] donation link at the top-right corner of the page.
Stated in January 2017, through donations the site only receives "more than $1.50 every day, enough for a bowl of phở".[IA•Wcite•.today]
As of March 2021, archived pages have started to show an advert at the top of the screen however, the owner has confirmed it[IA•Wcite•.today] is a test run and that they will likely not stay.
Content type
Archive.today takes snapshots of webpages with a JavaScript-capable headless browser. The maximum size of a webpage it will archive (including images) is 50 MB. An image screenshot of the webpage is also taken. It does not store arbitrary file types unlike the Wayback Machine: it won't save PDFs, binary files, Adobe Flash content, videos, or audio.
Archive.today represents captured pages as a static snapshot, rendered by the Archive.today server, and uses a fixed-width layout. Page resources such as JavaScript and CSS files are not retained separately. That is, styling from a separate CSS file is converted to inline CSS styling, embedded in the HTML source code. For more details on functionality see the Wikipedia page
Snapshots from Wayback Machine and Google Cache are searchable by the original URLs.
Site structure
A list of all domains currently archived used to be available here.
- List of all domains from https://archive.today/alldomains (as of 2014/02/20) = 7,255,826 domains
Sadly, the url counts from /alldomains were out of date.
All sitemaps (as of 2014/02/17)
Searches
Lists or indexes of captures can be viewed per site with 20 results per page; however, only the latest ~3000[12] captures are shown for a site (older links may also be unfindable[13]). 2026-04-11 some domains including www.nytimes.com[14] and www.wsj.com[15] do not have an index of captures, their index page instead displays the default web page for the nginx web server.
- Index
https://archive.today/offset=<number>/<domain name>
Archived webpage URLs
In the wild, archived pages may be found in their short URL format, an identifier with five case-sensitive alphanumerical characters and four characters on early captures from 2012. A long, or canonical URL format can be obtained by clicking "share" in the top menu or append "/share" to the URL, but it is not widely used. The top level domain may be any of the #Aliases (.today|.is|.ph|...).
- Short URL
https://archive.today/<XXXXX>- JS API
https://archive.today/cse.js?id=<XXXXX>- Long URL
https://archive.today/<date>/<original url>- Latest URL
https://archive.today/timegate/<original url>- All archives
https://archive.today/<original url>
Issues
Domain availability
As of 17 Feb, 2016 archive.today domain name is unavailable since 16 Feb, likely due to "fake DMCA requests", [1].
As of September 2019, archive.today, .fo etc. resolve to 127.0.0.3 from a few DNS servers (including in Finland), while they continue to work elsewhere, where they resolve to 130.0.234.124, 134.119.220.26 etc. The archive.fo domain was revoked on 2019-10-26.[16]
Indefinite loading
Sometimes, the page indicates “loading”[IA•Wcite•.today] when trying to access the page, instead of showing the page itself.
Ditching unsuccessful archivals
When the archival of a page has not been successful (e.g. “Error: time out.”, “Error: Network error.”), the existing information (network transfer and already downloaded resources) get discarded and the target URL of the page archival indicates “Not Found (yet?)”, the same it shows on pages that have never been archived, similarly to how YouTube behaves.
The ditch page also has no indicator of the target URL, and the submission URL removes itself from tab history using document.location.replace() but the browser may have recorded the submission URL in the global history, or since the ditch page also has no <title>, the submission URL may be used for the tab titles, and thus be extractable from browser data. Automation of the saves can also avoid losing the submission URLs entirely.
There are some URLs that always result in a ditched archival:
https://truthsocial.com/@realDonaldTrump/posts/116250203405172359
Dismissed information
Unlike Google Cache, Archive.today does not store the original web page source codes. Also the list of network transfers (shown during archival process) that shows the HTTP status, MIME type, object size (Bytes) and the URL of page elements. File names of saved (embedded) auxiliary page elements get changed into an SHA-1-hashsum of the file itself, discarding the original file names of images.
Since 2016, the Wayback Machine is unable to access Archive.today due to captcha.
Quota limits
Each IP address accessing the site apparently only gets an unknown limited amount of access quota. When archiving too many pages, their server eventually stops responding to the IP address for the next few hours.
Constant reCAPTCHAs
People using a VPN or proxy or on a mobile device report having to go through reCAPTCHAS every time they go through the site. When the captcha is completed, it gives you 5 minutes of access before asking for a Captcha again. Previously, the captcha was unable to be solved on mobile devices since the reCAPTCHA clipped on half the page, but this has since been fixed.
YouTube comment archival
Archive.Today used to be able to capture YouTube comments[17] and load more comments automatically to capture more comments than loaded on the initial AJAX load.
That only worked when archived directly on the YouTube watch page, e.g. “ https://www.youtube.com/watch?v=0mQW9aWkKl0 ”. When redirected from YouTu.be, it failed to archive the YouTube comments.
Because the way YouTube loads comments has been altered over time, since approximately late 2017, Archive.Today's ability to archive YouTube comments has been restricted.
Since then, to archive YouTube comments using Archive.Today, one needs to link directly to a specific comment, which causes comments to be pre-loaded.
- Example linked comment URL: https://www.youtube.com/watch?v=W3GrSMYbkBE&lc=UgxC238Gea0KGOditl54AaABAg[IA•Wcite•.today]
- Archived with linked comment: https://archive.today/OXq7u
- Archived without linked comment: https://archive.today/Uih0b
Rearchiving
When archiving a URL that has already been archived, the initial archive attempt will redirect to the existing archive, and if enough time has passed, there will be a save button that can be used to get a new archive. The button will be shown after one day, or possibly less. Adding &anyway=1 to the URL usually avoids having to click through.
Automation
Using Chromium or Firefox, jq, bash and an archive.today script it is possible to automate saving a list of URLs to the archive.today service. The script fully solves ditching unsuccessful archivals issue; for Chromium the script adds to the tab history a data: URL that redirects to the submission URL, accessible using the back button on ditch pages, and for Firefox the script loads the submission URL in an iframe and adds a link above it that can be used to reload the iframe. You should retry the ditched URls at least once. Some sites will not work at all. Some sites will work, but the page will indicate something went wrong, those will need to be retried at a later time once rearchiving of the URL is enabled again.
Saving (lots of) URLs may trigger the constant reCAPTCHAs issue.
If you aren't using the script above then you will need to use the below 'unreliable' mechanisms to handle the ditching unsuccessful archivals issue. You can retry the ditched URls, either by reloading them or by extracting ditched URLs from your browser data and then start a new run of just the ditched URLs. If you are using Chromium then the submission URL might be in the tab history, so just click back to resubmit. If you are using Firefox, the ditched URLs some of the time remain (albeit mangled) in the tab titles (at least while ditched pages do not contain a <title> tag) and there are a few mechanisms to deal with that.
Reload ditched URLs tabs
First use Ctrl+click to select the tabs that were ditched. Then open the Tools > Browser Tools > Browser Console and run the following JavaScript, which will reload all the selected tabs using the correct submit URLs. This will only work once, after the first reload you will loose the tab label.
p = Services.scriptSecurityManager.getSystemPrincipal();
gBrowser.selectedTabs.map((tab) => gBrowser.getBrowserForTab(tab).loadURI(Services.io.newURI('https://' + tab.label), {triggeringPrincipal: p}))
Load ditched URLs in new tabs
To avoid loosing the tab titles, you can load new tabs and close the old ones using this JavaScript code:
p = Services.scriptSecurityManager.getSystemPrincipal();
gBrowser.selectedTabs.map((tab) => {gBrowser.addTab('https://' + tab.label, {triggeringPrincipal: p}); gBrowser.removeTab(tab); })
Extract ditched URLs
There are a few mechanisms to extract the tab title data; browser extensions, recent browsing tab, browser history and browser console.
- Browser extensions
- Any browser extensions that can extract the titles of multiple tabs can be used, and then you can manually convert the titles to URLs by adding a "https://" prefix and then extract the "url" query parameter.
- Recent browsing tab
- The recent browsing tab has them in the "Open Tabs" list in the "Recent browsing" section. First close all tabs except for the ones with ditched URLs. Right click one of the "archive.today/submit/?url=" text pieces, go up to the
<virtual-list issublist=""tag, right click it and choose "Use in Console", then run the following JavaScript to log the URLs, then select and copy them.console.log(Array.from(temp0.children).map(child => new URL('https://' + child.shadowRoot.children[2].children[1].textContent.trim()).searchParams.get('url')).join('\n')) - Browser history
- Search for
url=in the browser history will list all of the submission URLs, and you can deal with the URLs from there. - Browser console
- The data is also accessible from Firefox's internal tab data in the Browser Console. First use Ctrl+click to select the tabs that were ditched. Then open the Tools > Browser Tools > Browser Console and run the following JavaScript, which will copy all the ditched URLs to the clipboard.
Components.classes["@mozilla.org/widget/clipboardhelper;1"].getService(Components.interfaces.nsIClipboardHelper).copyString(PlacesCommandHook.getUniquePages(gBrowser.selectedTabs).map(page => new URL('https://' + page.title).searchParams.get('url')).join('\n'))
Aliases
Besides archive.today[IA•Wcite•.today], the site has been or is available at the following domains:
- archive.is[IA•Wcite•.today]
- archive.li[IA•Wcite•.today]
- archive.vn[IA•Wcite•.today]
- archive.fo[IA•Wcite•.today]
- archive.md[IA•Wcite•.today]
- archive.ph[IA•Wcite•.today]
archive.ec[IA•Wcite•.today](As far as known, the Archive.ec domain was only used in 2016.[18])- archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion[IA•Wcite•.today]
Other features
It also has the ability to select certain portions of the page and embed that into the URL for sharing a specific portion.[19] This works by using a javascript handler to convert the selector element to a specific portion of the page. That seems to be the only portion of the archived site part of the page that needs Javascript, other than that, the site is completely accessible without it (provided you get past captcha).
Archives
References
- ↑ https://gyrovague.com/2023/08/05/archive-today-on-the-trail-of-the-mysterious-guerrilla-archivist-of-the-internet/
- ↑ (Compare the same page on (possibly turn off your Adblocker): the Wayback Machine and Archive.today)
- ↑ (Only users with a twitter account could access this page: https://archive.today/2023.02.09-022747/https://twitter.com/000627234/likes)
- ↑ (This page was saved as a logged in user: https://archive.today/2024.01.10-225837/https://github.com/Alex313031/thorium/issues/168)
- ↑ (The Drugs subreddit could be archived by archive.today but the wayback machine's archive attempt around the same time could not get the foot in the door because of a couple of Reddit's warnings)
- ↑ https://blog.archive.today/post/82775187091/curious-why-the-move-in-domain-names-from-archive-is[IA•Wcite•.today]
- ↑ https://twitter.com/archiveis/status/455710701948903424
- ↑ https://liberapay.com/archiveis/donate
- ↑ https://web.archive.org/web/20160808113809/https://archive.is/
- ↑ https://arstechnica.com/tech-policy/2026/02/wikipedia-bans-archive-today-after-site-executed-ddos-and-altered-web-captures/[IA•Wcite•.today]
- ↑ https://liberapay.com/archiveis/donate
- ↑ 2026-04-11 https://archive.today/factfinder.census.gov/ shows "1..20 of 2999 urls" but Wikipedia:Archive.today_guidance says it should have 14566 captures
- ↑ 2025-04-12 https://archive.today/nitter.poast.org had "1..20 of 2995 urls" but on https://archive.today/offset=2540/nitter.poast.org there was not next page link. 2026-04-11 nitter.poast.org now shows "1..20 of 2556 urls", so this may have just been a bug or other glitch
- ↑ https://web.archive.org/web/20260411075459/https://archive.today/www.nytimes.com
- ↑ https://web.archive.org/web/20260412012317/https://archive.today/www.wsj.com
- ↑ https://twitter.com/archiveis/status/1188222460598116353
- ↑ Sample Archive.Today crawl with YouTube comment loading
- ↑ http://archive.ec/, former (2016) domain of Archive.today, did not block self-archival.[IA•Wcite•.today]
- ↑ Example of a selection: #selection-95.2-95.49