Difference between revisions of "Talk:Internet Archive"

From Archiveteam
Jump to navigation Jump to search
Line 19: Line 19:


When archiving a [[YouTube]] URL with any parameters after it such as “<code>&feature=youtu.be</code>”, the Wayback Machine jumps back to “<span style=color:#00aef0;font-size: 1.5em; >This page is available on the web!</span>”, apparently without archiving the page. --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 20:22, 12 May 2019 (UTC)
When archiving a [[YouTube]] URL with any parameters after it such as “<code>&feature=youtu.be</code>”, the Wayback Machine jumps back to “<span style=color:#00aef0;font-size: 1.5em; >This page is available on the web!</span>”, apparently without archiving the page. --[[User:ATrescue|ATrescue]] ([[User talk:ATrescue|talk]]) 20:22, 12 May 2019 (UTC)
:Not sure if this is right place for discussing these things, but I think Wayback displays YouTube pages in a special manner, compared to other websites. An example of when I'm capturing (a limited amount of) comments:
# I use '<nowiki>https://web.archive.org/save/</nowiki>' on the [https://www.youtube.com/watch?v=jNQXAC9IVRw&lc=UgxCvmtU6kiDybbaeNF4AaABAg.8aP-DIeUPoe8aQKikPAJNN&disable_polymer=1 URL] for a comment, with the &disable_polymer=1 to make things actually visible. Wayback proceeds to capture the page.
# Wayback then goes to the [https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw&lc=UgxCvmtU6kiDybbaeNF4AaABAg.8aP-DIeUPoe8aQKikPAJNN&disable_polymer=1 "This page is available on the web!" page], like you say. However, the timestamp in the URL corresponds to the capture time.
# Erase the parameters in the URL until you just have the video ID, and leave the timestamp alone.  You'll be taken to the capture (in my case, with the limited comments and old YouTube interface): https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw
:YouTube in Wayback appears to be a special case, with the videos with and without the extra parameters being lumped together in the same record.  You can see this a little easier on this page (warning, sort of large since I'm using Me at the zoo as an example): https://web.archive.org/cdx/search/cdx?url=https://www.youtube.com/watch?v=jNQXAC9IVRw. --[[User:Amerepheasant|Amerepheasant]] ([[User talk:Amerepheasant|talk]]) 12:57, 13 May 2019 (UTC)


== Multiple captures in one second ==
== Multiple captures in one second ==

Revision as of 12:57, 13 May 2019

hi guys,

I was thinking to update the links on Uploading to archive.org/Tools/There's also an unofficial bookmarklet and shell function page as they don't seem to be working. Here is what I get when I click on them: http://paste.archivingyoursh.it/raw/fotikimule = function ia-save() { curl -s -m 60 -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' } & http://paste.archivingyoursh.it/raw/yovabepuxa = javascript:void(open('https://web.archive.org/save/'+document.location))] So I have found this http://www.bitsgalore.org/2014/08/02/How-to-save-a-web-page-to-the-Internet-Archive/ that is working pretty good and has helped archive many pages on the Internet Archive. I'll go ahead and do the change. Let me know if you know of any additional add-ons!

Torrent

When you upload a item via torrent, IA keeps seeding the torrent "forever"? HadeanEon (talk) 17:27, 7 February 2019 (UTC)

It does leech (tries to download) it for a maximum of 7 days. Also, IIRC it also stops downloading after being idle (no seeds) for 24 hours. (I used to upload stuff via bittorrent).
The torrent file you upload won't be seeded. The Archive creates a new torrent, which contains all files of the item. That one is probably seeded forever. bzc6p (talk) 07:17, 5 May 2019 (UTC)

robots.txt immunity?

Apparently, the Wayback Machine is now taking revenge on robots.txt and ignoring it increasingly.
Is it just me or did any of you also notice it? (I am extremely glad about it!) --ATrescue (talk) 18:12, 4 May 2019 (UTC)

Not just you. bzc6p (talk) 07:18, 5 May 2019 (UTC)
Good to know. Well done, Internet Archive! Don't let robots.txt censor us. --ATrescue (talk) 10:47, 6 May 2019 (UTC)

YouTube URL parameters.

When archiving a YouTube URL with any parameters after it such as “&feature=youtu.be”, the Wayback Machine jumps back to “This page is available on the web!”, apparently without archiving the page. --ATrescue (talk) 20:22, 12 May 2019 (UTC)

Not sure if this is right place for discussing these things, but I think Wayback displays YouTube pages in a special manner, compared to other websites. An example of when I'm capturing (a limited amount of) comments:
  1. I use 'https://web.archive.org/save/' on the URL for a comment, with the &disable_polymer=1 to make things actually visible. Wayback proceeds to capture the page.
  2. Wayback then goes to the "This page is available on the web!" page, like you say. However, the timestamp in the URL corresponds to the capture time.
  3. Erase the parameters in the URL until you just have the video ID, and leave the timestamp alone. You'll be taken to the capture (in my case, with the limited comments and old YouTube interface): https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw
YouTube in Wayback appears to be a special case, with the videos with and without the extra parameters being lumped together in the same record. You can see this a little easier on this page (warning, sort of large since I'm using Me at the zoo as an example): https://web.archive.org/cdx/search/cdx?url=https://www.youtube.com/watch?v=jNQXAC9IVRw. --Amerepheasant (talk) 12:57, 13 May 2019 (UTC)

Multiple captures in one second

I have noticed that sometimes, the Wayback Machine captures the same page multiple times in one second, where the second of archival is listed multiple times. I have no example for it at the moment (I will put one here as soon as I find one).

I wonder how the Wayback Machine differentiates between those URLs. --ATrescue (talk) 20:22, 12 May 2019 (UTC)

Archive desktop website.

If you try to archive a website into the Wayback Machine through your mobile phone browser, you will notice that it tries to archive the mobile version of the website.

Therefore, archive websites in “Desktop Mode”. --ATrescue (talk) 11:49, 13 May 2019 (UTC)