Difference between revisions of "Sketch"

From Archiveteam
Jump to navigation Jump to search
m (→‎Saved pages: order numbers less strangely)
Line 69: Line 69:
Almost all the images live in AWS S3. Many of these URI tokens expire 1 hour after being generated. A link can be checked whether it's expired because the Expired parameter is a timestamp after which it will not longer be valid. Where this is particularly problematic is the thumbnail URLs. Any webpage that contains a thumbnail image, those images need to be grabbed in the next hour or will be missing and can not be repaired later.
Almost all the images live in AWS S3. Many of these URI tokens expire 1 hour after being generated. A link can be checked whether it's expired because the Expired parameter is a timestamp after which it will not longer be valid. Where this is particularly problematic is the thumbnail URLs. Any webpage that contains a thumbnail image, those images need to be grabbed in the next hour or will be missing and can not be repaired later.


== Saved pages ==
== Saving with the internet archive ==


Saving urls with <nowiki>https://web.archive.org/save/</nowiki> actually works great, even though it looks like it fails.
Saving urls with <nowiki>https://web.archive.org/save/</nowiki> actually works great, even though it looks like it fails.
Line 81: Line 81:
In fact, it gets both png ang jpeg formats!
In fact, it gets both png ang jpeg formats!


We saved a random sample of about 50,000 pages.
We saved a random sample of about 50,000 pages (in addition to whatever the "real" code has saved).


There might be 18 million pages (the /feed/global/list/ urls seemed to run from about 180817470 to about 199228198)? If true, we saved 1 in 300 images.
There might be 18 million pages (the /feed/global/list/ urls seemed to run from about 180817470 to about 199228198)? If true, we saved 1 in 300 images.

Revision as of 22:58, 25 September 2019

Sketch
Explore Sketch
Explore Sketch
URL https://sketch.sonymobile.com
Status Closing
Archiving status In progress...
Archiving type Unknown
Project source sketch-grab
Project tracker sketch
IRC channel #SketchedOut (on hackint)

Sketch is an image drawing and editing app for smart phones made by Sony. The online parts of Sketch will be discontinued on 2019-09-30[1].

User counts

The corporate user account, https://sketch.sonymobile.com/u/sonysketch, which new accounts automatically follow, shows "1795890 FOLLOWERS" as of 2019-05-08 01:40 UTC "2073078 FOLLOWERS" as of 2019-07-08 20:51 UTC.

Browser URLs

Browsing pages:

By tag:

By user:

Sketch pages:

https://sketch.sonymobile.com/explore/featured/sketch/a72ed347-7606-4a99-8b3e-934fb809d5a1

Images

https://sketch-cloud-storage.s3.amazonaws.com/23135ca9-16a7-424c-a376-93004cd05782/40517d90-48a0-4be8-a3e1-f049437d5d62_s?AWSAccessKeyId=AKIAIVK24H6RLSWCC7OA&Signature=Mz1nTU887qV2COZNWsIpRSaC4fw%3D&Expires=1556691065

API

The following endpoints can be used to list sketches, and powers the explore view:

The last one seems to list all (23 million) available sketches. Example usage:

Other APIs

Additional APIs are needed for the mobile app to function but have not yet been located.

  • Who's following whom
  • Related tags to currently viewed tag

AWS S3 Expiration issue

Almost all the images live in AWS S3. Many of these URI tokens expire 1 hour after being generated. A link can be checked whether it's expired because the Expired parameter is a timestamp after which it will not longer be valid. Where this is particularly problematic is the thumbnail URLs. Any webpage that contains a thumbnail image, those images need to be grabbed in the next hour or will be missing and can not be repaired later.

Saving with the internet archive

Saving urls with https://web.archive.org/save/ actually works great, even though it looks like it fails.

The page will always show "404 Not Found", such as this random example.

But the page html contains: \"imageUrl\":\"https:\u002F\u002Fsketch-cloud-storage.s3.amazonaws.com\u002Fac16b85e-f285-44f3-ae2d-4c3ccdc1f413\u002F2ad6cf73-41f7-4048-aa2f-2eaca80b1ebc_s?AWSAccessKeyId...

And that link works, at least as a prefix search: https://web.archive.org/web/*/https://sketch-cloud-storage.s3.amazonaws.com/ac16b85e-f285-44f3-ae2d-4c3ccdc1f413/2ad6cf73-41f7-4048-aa2f-2eaca80b1ebc*

In fact, it gets both png ang jpeg formats!

We saved a random sample of about 50,000 pages (in addition to whatever the "real" code has saved).

There might be 18 million pages (the /feed/global/list/ urls seemed to run from about 180817470 to about 199228198)? If true, we saved 1 in 300 images.

References