From Archiveteam
Revision as of 19:08, 17 December 2017 by Vitorio (talk | contribs) (Details about preserving additional files from the Lytro hosting shutdown which are still present on their CDN)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Lytro manufactured "light field" cameras, and offered free hosting for the exported "living picture" images on their service. They recently discontinued the hosting service, breaking all embeds. See e.g.

For an example of broken embeds, see e.g.

Lytro had announced at one point plans to open-source their viewer, but that never happened:

The Internet Archive Wayback Machine captured many Lytro embeds and galleries, but none of these currently work. The embedded Lytro web player references several JSON files, which IA captured, but did not parse, so all the URLs referenced in the JSON were not retrieved. contains three sets of WARCs which capture many of the missing files.

The lfes-not-in-ia-4.txt file in that item contains a list of ~1.2M URLs which are the image assets referenced in the JSON files.

Worklog follows: and were downloaded from the Wayback Machine based on the date prior to the shutdown:

$ ~/.gem/ruby/2.0.0/bin/wayback_machine_downloader -t 20171129
$ ~/.gem/ruby/2.0.0/bin/wayback_machine_downloader -t 20171129

A regex search inside of all of the files to find lfe-cdn references:


Downloaded all the URLs from that list which weren't already present:

$ ~/bin/wget --ca-certificate=$HOME/Downloads/curl-7.57.0/lib/ca-bundle.crt -x --warc-file=lfes-not-in-ia-1 --warc-cdx --wait=1 --random-wait -i lfes-not-in-ia-1.txt 

That list wasn't reduplicated, there were ~10k duplicate URLs out of ~40k. Also, a lot of 500 errors off the CDN, mostly for URLs without a "v2" in the URL. e.g. these fail:

but if I rewrite the second one to:

it exists.

The v2 (I guess) player is their "new" WebGL-based one, which asks for these paths in the JS, e.g.:


Went through all the URLs and pulled out all the UUID keys and see if there are those additional paths still to be fetched.

$ ~/bin/wget --ca-certificate=/Users/vitorio/Downloads/curl-7.57.0/lib/ca-bundle.crt -x --warc-file=lfes-not-in-ia-2 --warc-cdx --warc-max-size=1G --wait=1 --random-wait -i lfes-not-in-ia-2.txt

Could also back up, an S3 bucket which stores all the JSON schemas for the JSON files.

Note that the JSON files include an `asset_base` URL reference to, but that isn't actually part of the JSON schema, and it doesn't appear to be checked by the last version of the player JS, so no need to rewrite it.

Now that we have all the UUIDs, let's check all the folders in the CDN directory to make sure we haven't missed any JSON downloads.

$ ~/bin/wget --ca-certificate=/Users/vitorio/Downloads/curl-7.57.0/lib/ca-bundle.crt -x --warc-file=lfes-not-in-ia-3 --warc-cdx --warc-max-size=1G --wait=1 --random-wait -i lfes-not-in-ia-3.txt

At least some of the JSON files were served by IA erroneously without being gzip decompressed:

The filenames generated by Lytro's processes seem to be fairly unique per embed, so no point in generating URLs not represented in the JSON files.