500px

From Archiveteam
Revision as of 06:25, 29 June 2018 by Adinbied (talk | contribs) (Added lots of info about Archival, API, IRC, etc.)
Jump to navigation Jump to search

{{Infobox project | title = 500px | image = 500pxdotcom screenshot.png | description = High-quality photo sharing & selling site | URL = http://www.500px.com[IAWcite.todayMemWeb] | project_status = {{endangered} | archiving_status = Not saved yet | irc = 500pieces }} 500px is a photo sharing site, that caters to high-quality photos. It provides ways to photographers to sell their images, as well as providing a large collection of images to view. On June 30th, they are removing all Creative Commons images from their site (see https://support.500px.com/hc/en-us/articles/360005097533)


Archival

My method of getting API info: Using the BurpSuite Pro Network Security tools, I set up a MITM attack in between a VM with a custom SSL CA certificate installed and the server. After intercepting a request to api.500px.com, I cloned the request and sent it to the "Intruder" Tool, where I set the page string in the GET request to the API as a 'payload', then had it auto-increment numbers while processing the requests and saving the responses. I set the limit to be 1000, although I ended up stopping it at around 900 because I noticed the responses were turning empty (and theres a total pages number in the api info). I 7zipped all of the responses and threw them up on the IA for someone to have a go at if they want, because after writing this I'm heading to bed. Attribution License 3.0 All API Info: https://archive.org/details/AttributionLicense3APISeverResponses.7z Example of one of the responses: https://pastebin.com/TygNSTSu

I also had a go at writing a python script that once given a list of URLs, it would parse and download all of the metadata and photos from those URLS: https://github.com/adinbied/500pxBU

Hopefully someone can pick up where I left off using what I've posted - I should be back around 3PM UTC on 6/29/18.

~adinbied