Difference between revisions of "Google Drive"

From Archiveteam
Jump to navigation Jump to search
(I think this is on hiatus)
(Make some of my research useful for future downloaders)
Line 28: Line 28:
== Getting your files ==
== Getting your files ==
The rudimentary downloader for the 2021 grab is now [https://github.com/OrIdow6/google-drive-downloader on Github].
The rudimentary downloader for the 2021 grab is now [https://github.com/OrIdow6/google-drive-downloader on Github].
== Notes ==
Of [https://en.wikipedia.org/wiki/Google_Drive#File_viewing the native types]:
* Docs may be public, [https://digital-archiving.blogspot.com/2017/04/how-can-we-preserve-google-documents_35.html this] is a good description of the formats available for downloading.
* Sheets, slides, drawings, Jamboard ditto, different formats.
* Forms are not downloadable in their totality but public ones may be accessed at <code><nowiki>https://docs.google.com/forms/d/e/[ID]/viewform</nowiki></code> (along with some other, near-identical pages). If they have public results, those will be visible at <code><nowiki>https://docs.google.com/forms/d/e/[ID]/viewanalytics</nowiki></code>.
* My Maps by default just display a preview image. Seemingly not indicated from the Drive interface, at least without an account, is that they can be fully viewed at <code><nowiki>https://www.google.com/maps/d/viewer?mid=[ID]</nowiki></code>.
* [[Google Sites|Sites]] are apparently not "published" in a way that is connected to their Drive entry<ref>https://support.google.com/a/users/answer/9310269?hl=en</ref>, but it is unclear to me at this point whether they may be publicly listed in a folder, and what happens if you click on them.
Additionally, not (for our purposes) native formats are:
* Colab, which is just a static file with a special editor, e.g. [https://drive.google.com/drive/folders/1tFYAo9POrdaXzp0D_cjbofHldfpin3U8 here]
* Google Keep, which is "part of the... Google Docs Editors suite"<ref>https://en.wikipedia.org/wiki/Google_Keep</ref> but does not seem to be accessible from Drive.


== References ==
== References ==
<references />
<references />

Revision as of 01:34, 22 December 2023

Google Drive is a filehosting service, a la Dropbox, run by Google (not to be confused with Google Cloud Storage and similar more technical storage solutions). It is popular both for personal storage and for sharing of files.

2021 grab

Google Drive IDs are not random (anecdotally, IDs of folders in the same tree often share long parts), which makes them predictable, a problem which Google had been trying to rectify across its products (others of which have similar issues) throughout 2021[1]. As such, on September 13, 2021, Google required that, in order to access files and folders, users either have permissions tied to their signed-in Google Accounts, or access the item through a URL with a random per-item parameter called resourceKey, apparently introduced in 2021.[2] The result of this will be that at least millions of links across the Web will effectively break. Docs, Sheets and Slides will be exempted from this update [3] Apart from the longening links, the main threat was the users deleting files. Files are usually deleted to fit the users 15 GB limit.

Grab

The grab script had 3 item types, folder:, file:, and user:. It was intended that all folder: items be run first, to get a pool (through backfeed) of file: items, that can be randomly sampled to determine a size threshold that the Internet Archive will accept; then files will be run. Users contain some user metadata but not links to files or folders.

Playback is theoretically possible with a flexible, POST-capable Wayback Machine, but this does not yet exist. In the meantime, it may be possible to get files with vanilla wget or similar from the WBM.

Results

(This is based on OrIdow6's very vague memories) It appeared that there were 2 types of Google Drive items, those that automatically got a redirect to a version with a resourceKey, and those that didn't. There was speculation that the latter, which had more random-looking IDs, would not suffer in the quasi-removal.

Getting your files

The rudimentary downloader for the 2021 grab is now on Github.

Notes

Of the native types:

  • Docs may be public, this is a good description of the formats available for downloading.
  • Sheets, slides, drawings, Jamboard ditto, different formats.
  • Forms are not downloadable in their totality but public ones may be accessed at https://docs.google.com/forms/d/e/[ID]/viewform (along with some other, near-identical pages). If they have public results, those will be visible at https://docs.google.com/forms/d/e/[ID]/viewanalytics.
  • My Maps by default just display a preview image. Seemingly not indicated from the Drive interface, at least without an account, is that they can be fully viewed at https://www.google.com/maps/d/viewer?mid=[ID].
  • Sites are apparently not "published" in a way that is connected to their Drive entry[4], but it is unclear to me at this point whether they may be publicly listed in a folder, and what happens if you click on them.

Additionally, not (for our purposes) native formats are:

  • Colab, which is just a static file with a special editor, e.g. here
  • Google Keep, which is "part of the... Google Docs Editors suite"[5] but does not seem to be accessible from Drive.


References