Difference between revisions of "Talk:Internet Archive"

From Archiveteam
Jump to navigation Jump to search
(→‎YouTube URL parameters.: Thanks for Wayback's “cdx”!)
 
(3 intermediate revisions by 2 users not shown)
Line 23: Line 23:
# Wayback then goes to the [https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw&lc=UgxCvmtU6kiDybbaeNF4AaABAg.8aP-DIeUPoe8aQKikPAJNN&disable_polymer=1 "This page is available on the web!" page], like you say. However, the timestamp in the URL corresponds to the capture time.
# Wayback then goes to the [https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw&lc=UgxCvmtU6kiDybbaeNF4AaABAg.8aP-DIeUPoe8aQKikPAJNN&disable_polymer=1 "This page is available on the web!" page], like you say. However, the timestamp in the URL corresponds to the capture time.
# Erase the parameters in the URL until you just have the video ID, and leave the timestamp alone.  You'll be taken to the capture (in my case, with the limited comments and old YouTube interface): https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw
# Erase the parameters in the URL until you just have the video ID, and leave the timestamp alone.  You'll be taken to the capture (in my case, with the limited comments and old YouTube interface): https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw
:YouTube in Wayback appears to be a special case, with the videos with and without the extra parameters being lumped together in the same record.  You can see this a little easier on this page (warning, sort of large since I'm using Me at the zoo as an example): {{U|1=https://web.archive.org/cdx/search/cdx?url=https://www.youtube.com/watch?v=jNQXAC9IVRw. }} --[[User:Amerepheasant|Amerepheasant]] ([[User talk:Amerepheasant|talk]]) 12:57, 13 May 2019 (UTC)
:YouTube in Wayback appears to be a special case, with the videos with and without the extra parameters being lumped together in the same record.  You can see this a little easier on this page (warning, sort of large since I'm using Me at the zoo as an example): {{URL|1=https://web.archive.org/cdx/search/cdx?url=https://www.youtube.com/watch?v=jNQXAC9IVRw}}. --[[User:Amerepheasant|Amerepheasant]] ([[User talk:Amerepheasant|talk]]) 12:57, 13 May 2019 (UTC)
:: Interesting. And Great thanks, {{user|Amerepheasant}} for the “<code>https://web.archive.org/cdx/search/cdx?url= </code>” link. I did not know yet that i existed. It shows all the records as a plain text table. How prctical!
:: Interesting. And Great thanks, [[User:Amerepheasant]] for the “<code>https://web.archive.org/cdx/search/cdx?url= </code>” link. I did not know yet that i existed. It shows all the records as a plain text table. How prctical! --[[User:ATrescue]], 2019-05-13
:::Cheers! There's more info on using that [https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server here] (from the External Links section) --[[User:Amerepheasant|Amerepheasant]] ([[User talk:Amerepheasant|talk]]) 05:12, 14 May 2019 (UTC)


== Multiple captures in one second ==
== Multiple captures in one second ==

Latest revision as of 05:01, 26 May 2021

hi guys,

I was thinking to update the links on Uploading to archive.org/Tools/There's also an unofficial bookmarklet and shell function page as they don't seem to be working. Here is what I get when I click on them: http://paste.archivingyoursh.it/raw/fotikimule = function ia-save() { curl -s -m 60 -I https://web.archive.org/save/$* | grep Content-Location | awk '{print "https://web.archive.org"$2}' } & http://paste.archivingyoursh.it/raw/yovabepuxa = javascript:void(open('https://web.archive.org/save/'+document.location))] So I have found this http://www.bitsgalore.org/2014/08/02/How-to-save-a-web-page-to-the-Internet-Archive/ that is working pretty good and has helped archive many pages on the Internet Archive. I'll go ahead and do the change. Let me know if you know of any additional add-ons!

Torrent

When you upload a item via torrent, IA keeps seeding the torrent "forever"? HadeanEon (talk) 17:27, 7 February 2019 (UTC)

It does leech (tries to download) it for a maximum of 7 days. Also, IIRC it also stops downloading after being idle (no seeds) for 24 hours. (I used to upload stuff via bittorrent).
The torrent file you upload won't be seeded. The Archive creates a new torrent, which contains all files of the item. That one is probably seeded forever. bzc6p (talk) 07:17, 5 May 2019 (UTC)

robots.txt immunity?

Apparently, the Wayback Machine is now taking revenge on robots.txt and ignoring it increasingly.
Is it just me or did any of you also notice it? (I am extremely glad about it!) --ATrescue (talk) 18:12, 4 May 2019 (UTC)

Not just you. bzc6p (talk) 07:18, 5 May 2019 (UTC)
Good to know. Well done, Internet Archive! Don't let robots.txt censor us. --ATrescue (talk) 10:47, 6 May 2019 (UTC)

YouTube URL parameters.

When archiving a YouTube URL with any parameters after it such as “&feature=youtu.be”, the Wayback Machine jumps back to “This page is available on the web!”, apparently without archiving the page. --ATrescue (talk) 20:22, 12 May 2019 (UTC)

Not sure if this is right place for discussing these things, but I think Wayback displays YouTube pages in a special manner, compared to other websites. An example of when I'm capturing (a limited amount of) comments:
  1. I use 'https://web.archive.org/save/' on the URL for a comment, with the &disable_polymer=1 to make things actually visible. Wayback proceeds to capture the page.
  2. Wayback then goes to the "This page is available on the web!" page, like you say. However, the timestamp in the URL corresponds to the capture time.
  3. Erase the parameters in the URL until you just have the video ID, and leave the timestamp alone. You'll be taken to the capture (in my case, with the limited comments and old YouTube interface): https://web.archive.org/web/20190513122135/https://www.youtube.com/watch?v=jNQXAC9IVRw
YouTube in Wayback appears to be a special case, with the videos with and without the extra parameters being lumped together in the same record. You can see this a little easier on this page (warning, sort of large since I'm using Me at the zoo as an example): https://web.archive.org/cdx/search/cdx?url=https://www.youtube.com/watch?v=jNQXAC9IVRw[IAWcite.todayMemWeb]. --Amerepheasant (talk) 12:57, 13 May 2019 (UTC)
Interesting. And Great thanks, User:Amerepheasant for the “https://web.archive.org/cdx/search/cdx?url= ” link. I did not know yet that i existed. It shows all the records as a plain text table. How prctical! --User:ATrescue, 2019-05-13
Cheers! There's more info on using that here (from the External Links section) --Amerepheasant (talk) 05:12, 14 May 2019 (UTC)

Multiple captures in one second

I have noticed that sometimes, the Wayback Machine captures the same page multiple times in one second, where the second of archival is listed multiple times. I have no example for it at the moment (I will put one here as soon as I find one).

I wonder how the Wayback Machine differentiates between those URLs. --ATrescue (talk) 20:22, 12 May 2019 (UTC)

Archive desktop website.

If you try to archive a website into the Wayback Machine through your mobile phone browser, you will notice that it tries to archive the mobile version of the website.

Therefore, archive websites in “Desktop Mode”. --ATrescue (talk) 11:49, 13 May 2019 (UTC)