MS Paint Fan Adventures

From Archiveteam
Revision as of 17:17, 6 March 2018 by Riking (talk | contribs) (add instructions for running, add more status entries)
Jump to navigation Jump to search

Stub to write down notes about project in progress

Archiving the contents of https://mspfa.com/ - uses a JS app to read text-based stories with embedded images and Flash content. Often link to YouTube alternates for flashes.

Custom archiver code is at https://github.com/riking/mspfa-archiver . Giant tangled mess of Go scripts.

Contact @riking to get upload permissions for the Archive collection

Operating the custom archiver

Prep

  1. Clone into $GOPATH - there will probably be updates to the script
  2. Get wpull: pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt
  3. Symlink it into the current directory: ln -s $(which wpull)
  4. Get youtube-dl, youtube-dl -U as needed
  5. If needed, symlink ./target to a bigger drive or specify -o=folder every time you run it
  6. Build the code (TODO: patches to datatogether/warc) go build -v .
  7. Put IAS3 credentials into ./ias3.json - {"AccessKeyID": "...", "SecretAccessKey": "..."}

Running

While testing, make sure to include both -test -ident MSPFA_Test_12345 so that the uploaded Archive items go into test_collection. When testing changes to the script, use -devScript

Basic usage: ./mspfa-archiver -dl -ident auto -s 1234

If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the -fu ("F"orce "U"pload) flag.

If you encounter a dead domain and don't want to wait for wpull, include -wpullArgs '--exclude-domains majhost.com,g0m.yore.ma' etc etc.

TODO - Script to automatically run on each story ID and save a list of failures

Work Division

1-999

Archiver: riking

Stage: Initial archiving

Problematic story IDs:

(note: this table should not be taken as an example, it's translated from my bad notes)

Story ID Archived OK? Problems
1 Yes Single failed Photobucket URL.
4 Yes Photobucket
12 Yes Photobucket
14 Yes 404s
17 No Dead domain: forum-files2.fobby.net
19 Mostly broken URL: pasted twice in a row
21 No Contact Author "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com
22 Mostly broken URL: pasted twice in a row
24 No Photobucket
25 No 404s
26 No 404s: imageshack
33 No Dead domain: yore.ma
35 No broken URL
45 No 404s
46 No Dead domain: myfrogbag
48 No 404s: imageshack
53 No Dead domain: ardekantur.com
55 No 404s: imageshack
59 No Photobucket
61 No 404s
63 No 404s
66 No Dead domain: myfrogbag
67 No 404s: photobucket
76 No 404s: photobucket
77 No 404s: imageshack
78 No 404s: imageshack
81 No Dead domain: TBD
87 No 404s; Pulling HTML pages
89 Yes 404s: imageshack
93 No Dead domain: TBD
104 No Dead domain: suspended webhost
106 No Dead domain: TBD
108 No 404s: imagebin
109 No 404s; Dead domain
111 No 404s: imageshack
135 No 404s: photobucket
158 No 404s
160 Yes 404s: imageshack
227 No uses blob: urls??
241 No Dropbox public folder
263 No HTML: imageshack homepage
270 No Dropbox public folder
277 No Dead domain:
285 No HTML
307 No 404s
308 No 404s
314 No Dead domain: blacktourney.com
319 No Dropbox public folder
323 No i.minus.com
325 No Dead domain: majhost.com
331 No Dropbox public folder
339 Mostly A Single Photobucket BWE
341 No Dropbox public folder
350 No Dead domain: majhost.com
351 Some Imageshack 404s
352 Mostly SWFs hosted at files.myfrogbag.com
353 Mostly SWFs hosted at files.myfrogbag.com
728 No Dropbox public folder

1000-1999

2000-2999

3000-3999