MS Paint Fan Adventures
Stub to write down notes about project in progress
Archiving the contents of https://mspfa.com/ - uses a JS app to read text-based stories with embedded images and Flash content. Often link to YouTube alternates for flashes.
Custom archiver code is at https://github.com/riking/mspfa-archiver . Giant tangled mess of Go scripts.
Contact @riking to get upload permissions for the Archive collection
Operating the custom archiver
Prep
- Clone into $GOPATH - there will probably be updates to the script
- Get wpull:
pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt
- Symlink it into the current directory:
ln -s $(which wpull)
- Get youtube-dl,
youtube-dl -U
as needed - If needed, symlink ./target to a bigger drive or specify
-o=folder
every time you run it - Build the code (TODO: patches to datatogether/warc)
go build -v .
- Put IAS3 credentials into ./ias3.json -
{"AccessKeyID": "...", "SecretAccessKey": "..."}
Running
While testing, make sure to include both -test -ident MSPFA_Test_12345
so that the uploaded Archive items go into test_collection.
When testing changes to the script, use -devScript
Basic usage: ./mspfa-archiver -dl -ident auto -s 1234
If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the -fu
("F"orce "U"pload) flag.
If you encounter a dead domain and don't want to wait for wpull, include -wpullArgs '--exclude-domains majhost.com,g0m.yore.ma'
etc etc.
TODO - Script to automatically run on each story ID and save a list of failures
Work Division
1-999
Archiver: riking
Stage: Initial archiving
Problematic story IDs:
(note: this table should not be taken as an example, it's translated from my bad notes)
Story ID | Archived OK? | Problems |
---|---|---|
1 | Mostly | Single failed Photobucket URL. |
4 | Yes | Photobucket |
12 | Yes | Photobucket |
14 | Yes | 404s |
17 | No | Dead domain: forum-files2.fobby.net |
19 | Mostly | broken URL: pasted twice in a row |
21 | No, Contact Author | "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com |
22 | Mostly | broken URL: pasted twice in a row |
24 | Yes | Photobucket |
25 | Yes | 404s: imageshack, 22/22 rescued from Wayback |
26 | Mostly | 404s: imageshack, -3(41)/44 rescued from Wayback |
33 | No | Dead domain: yore.ma |
35 | Mostly | broken URL: single photobucket image ends in "89.pngp" |
45 | No | Dead webhost customer: http://armada.lostsignalweb.com; Dead domain: imageplay.net; 404s: imageshack |
46 | No | Dead domain: myfrogbag |
48 | Mostly | 404s: imageshack, -10(105)/115 rescued from Wayback |
53 | No | Dead domain: ardekantur.com |
55 | No | 404s: imageshack |
59 | No | Photobucket |
61 | No | 404s |
63 | No | 404s |
66 | No | Dead domain: myfrogbag |
67 | No | 404s: photobucket |
76 | No | 404s: photobucket |
77 | No | 404s: imageshack |
78 | No | 404s: imageshack |
81 | No | Dead domain: TBD |
87 | No | 404s; Pulling HTML pages |
89 | Yes | 404s: imageshack |
93 | No | Dead domain: TBD |
104 | No | Dead domain: suspended webhost |
106 | No | Dead domain: TBD |
108 | No | 404s: imagebin |
109 | No | 404s; Dead domain |
111 | No | 404s: imageshack |
135 | No | 404s: photobucket |
158 | No | 404s |
160 | Yes | 404s: imageshack |
227 | No | uses blob: urls?? |
241 | No | Dropbox public folder |
263 | No | HTML: imageshack homepage |
270 | No | Dropbox public folder |
277 | No | Dead domain: |
285 | No | HTML |
307 | No | 404s |
308 | No | 404s |
314 | No | Dead domain: blacktourney.com |
319 | No | Dropbox public folder |
323 | No | i.minus.com |
325 | No | Dead domain: majhost.com |
331 | No | Dropbox public folder |
339 | Mostly | A Single Photobucket BWE |
341 | No | Dropbox public folder |
350 | No | Dead domain: majhost.com |
351 | Some | Imageshack 404s |
352 | Mostly | SWFs hosted at files.myfrogbag.com |
353 | Mostly | SWFs hosted at files.myfrogbag.com |
357 | No | Deleted imgur files |
367 | Yes | Single photobucket BWE, retrieved from Wayback |
728 | No | Dropbox public folder |