Difference between revisions of "500px"

From Archiveteam
Jump to navigation Jump to search
(Added More API Response IA URLS)
Line 19: Line 19:
== Archival ==
== Archival ==
API Responses:
API Responses:
https://archive.org/details/AttributionLicense3APISeverResponses.7z
https://archive.org/details/AttributionLicense3APISeverResponses.7z
https://archive.org/details/AttributionNoDerivativesLicense3APIServerResponses.7z
https://archive.org/details/AttributionNoDerivativesLicense3APIServerResponses.7z
https://archive.org/details/AttributionShareAlikeLicense3APIServerResponses.7z
https://archive.org/details/AttributionShareAlikeLicense3APIServerResponses.7z
https://archive.org/details/AttributionNonCommercialLicense3APIServerResponses.7z
https://archive.org/details/AttributionNonCommercialLicense3APIServerResponses.7z
Example of one of the responses: https://pastebin.com/TygNSTSu
Example of one of the responses: https://pastebin.com/TygNSTSu


Unfortunately, the official API is dead - but fear not fellow Archivists! I have found a (admittedly bodged together) method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. (It seems that 500px use a combination of browser cookies, a device uuid, and a few other keys to block widespread use of their API). After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA.
Unfortunately, the official API is dead - but fear not fellow Archivists! I have found a (admittedly bodged together) method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. (It seems that 500px use a combination of browser cookies, a device uuid, and a few other keys to block widespread use of their API). After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA.

Revision as of 13:06, 29 June 2018

500px
500px logo
High-quality photo sharing & selling site
High-quality photo sharing & selling site
URL http://www.500px.com[IAWcite.todayMemWeb]
Status Online!, but Creative Commons-licensed images Endangered
Archiving status Upcoming...
Archiving type Unknown
Project source 500px-grab
IRC channel #500pieces (on hackint)

500px is a photo sharing site, that caters to high-quality photos. It provides ways to photographers to sell their images, as well as providing a large collection of images to view.

Creative Commons image massacre

500px announced that they would be no longer directly licensing images through 500px Marketplace, in favor of outsourcing distribution duties to Getty Images (and Visual China Group inside of China). One consequence of this is that all Creative Commons-licensed images, as well as images where users have opted out of distribution will disappear by June 30th.

A Warrior project will begin at or around 4 AM PDT/6 AM CDT/1100 UTC in a mad dash to grab as many images as we can. Start your Warriors and be in #500pieces (on hackint).

Archival

API Responses:

https://archive.org/details/AttributionLicense3APISeverResponses.7z

https://archive.org/details/AttributionNoDerivativesLicense3APIServerResponses.7z

https://archive.org/details/AttributionShareAlikeLicense3APIServerResponses.7z

https://archive.org/details/AttributionNonCommercialLicense3APIServerResponses.7z

Example of one of the responses: https://pastebin.com/TygNSTSu


Unfortunately, the official API is dead - but fear not fellow Archivists! I have found a (admittedly bodged together) method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. (It seems that 500px use a combination of browser cookies, a device uuid, and a few other keys to block widespread use of their API). After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA.

I also had a go at writing a python script that once given a list of URLs, it would parse and download all of the metadata and photos from those URLS: https://github.com/adinbied/500pxBU

~adinbied