Difference between revisions of "MS Paint Fan Adventures"
(→1-999) |
(add instructions for running, add more status entries) |
||
Line 6: | Line 6: | ||
Contact @riking to get upload permissions for the Archive collection | Contact @riking to get upload permissions for the Archive collection | ||
== Operating the custom archiver == | |||
=== Prep === | |||
# Clone into $GOPATH - there will probably be updates to the script | |||
# Get wpull: <code>pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt</code> | |||
# Symlink it into the current directory: <code>ln -s $(which wpull)</code> | |||
# Get youtube-dl, <code>youtube-dl -U</code> as needed | |||
# If needed, symlink ./target to a bigger drive or specify <code>-o=folder</code> every time you run it | |||
# Build the code (TODO: patches to datatogether/warc) <code>go build -v .</code> | |||
# Put IAS3 credentials into ./ias3.json - <code>{"AccessKeyID": "...", "SecretAccessKey": "..."}</code> | |||
=== Running === | |||
While testing, make sure to include both <code>-test -ident MSPFA_Test_12345</code> so that the uploaded Archive items go into test_collection. | |||
When testing changes to the script, use <code>-devScript</code> | |||
Basic usage: <code>./mspfa-archiver -dl -ident auto -s 1234</code> | |||
If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the <code>-fu</code> ("F"orce "U"pload) flag. | |||
If you encounter a dead domain and don't want to wait for wpull, include <code>-wpullArgs '--exclude-domains majhost.com,g0m.yore.ma'</code> etc etc. | |||
TODO - Script to automatically run on each story ID and save a list of failures | |||
== Work Division == | == Work Division == | ||
Line 21: | Line 46: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
! Story ID !! | ! Story ID !! Archived OK? !! Problems | ||
|- | |- | ||
| 1 || {{yes}} || Single failed Photobucket URL. | | 1 || {{yes}} || Single failed Photobucket URL. | ||
Line 33: | Line 58: | ||
| 17 || {{no}} || Dead domain: forum-files2.fobby.net | | 17 || {{no}} || Dead domain: forum-files2.fobby.net | ||
|- | |- | ||
| 19 || {{maybe|Mostly | | 19 || {{maybe|Mostly}} || broken URL: pasted twice in a row | ||
|- | |- | ||
| 21 || {{no}} || | | 21 || {{no}} Contact Author || "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com | ||
|- | |- | ||
| 22 || {{ | | 22 || {{maybe|Mostly}} || broken URL: pasted twice in a row | ||
|- | |- | ||
| 24 || {{no}} || Photobucket | | 24 || {{no}} || Photobucket | ||
Line 125: | Line 150: | ||
| 331 || {{no}} || Dropbox public folder | | 331 || {{no}} || Dropbox public folder | ||
|- | |- | ||
| 339 || {{maybe|Mostly | | 339 || {{maybe|Mostly}} || A Single Photobucket BWE | ||
|- | |- | ||
| 341 || {{no}} || Dropbox public folder | | 341 || {{no}} || Dropbox public folder | ||
|- | |||
| 350 || {{no}} || Dead domain: majhost.com | |||
|- | |||
| 351 || {{maybe|Some}} || Imageshack 404s | |||
|- | |||
| 352 || {{maybe|Mostly}} || SWFs hosted at files.myfrogbag.com | |||
|- | |||
| 353 || {{maybe|Mostly}} || SWFs hosted at files.myfrogbag.com | |||
|- | |- | ||
| 728 || {{no}} || Dropbox public folder | | 728 || {{no}} || Dropbox public folder |
Revision as of 17:17, 6 March 2018
Stub to write down notes about project in progress
Archiving the contents of https://mspfa.com/ - uses a JS app to read text-based stories with embedded images and Flash content. Often link to YouTube alternates for flashes.
Custom archiver code is at https://github.com/riking/mspfa-archiver . Giant tangled mess of Go scripts.
Contact @riking to get upload permissions for the Archive collection
Operating the custom archiver
Prep
- Clone into $GOPATH - there will probably be updates to the script
- Get wpull:
pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt
- Symlink it into the current directory:
ln -s $(which wpull)
- Get youtube-dl,
youtube-dl -U
as needed - If needed, symlink ./target to a bigger drive or specify
-o=folder
every time you run it - Build the code (TODO: patches to datatogether/warc)
go build -v .
- Put IAS3 credentials into ./ias3.json -
{"AccessKeyID": "...", "SecretAccessKey": "..."}
Running
While testing, make sure to include both -test -ident MSPFA_Test_12345
so that the uploaded Archive items go into test_collection.
When testing changes to the script, use -devScript
Basic usage: ./mspfa-archiver -dl -ident auto -s 1234
If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the -fu
("F"orce "U"pload) flag.
If you encounter a dead domain and don't want to wait for wpull, include -wpullArgs '--exclude-domains majhost.com,g0m.yore.ma'
etc etc.
TODO - Script to automatically run on each story ID and save a list of failures
Work Division
1-999
Archiver: riking
Stage: Initial archiving
Problematic story IDs:
(note: this table should not be taken as an example, it's translated from my bad notes)
Story ID | Archived OK? | Problems |
---|---|---|
1 | Yes | Single failed Photobucket URL. |
4 | Yes | Photobucket |
12 | Yes | Photobucket |
14 | Yes | 404s |
17 | No | Dead domain: forum-files2.fobby.net |
19 | Mostly | broken URL: pasted twice in a row |
21 | No Contact Author | "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com |
22 | Mostly | broken URL: pasted twice in a row |
24 | No | Photobucket |
25 | No | 404s |
26 | No | 404s: imageshack |
33 | No | Dead domain: yore.ma |
35 | No | broken URL |
45 | No | 404s |
46 | No | Dead domain: myfrogbag |
48 | No | 404s: imageshack |
53 | No | Dead domain: ardekantur.com |
55 | No | 404s: imageshack |
59 | No | Photobucket |
61 | No | 404s |
63 | No | 404s |
66 | No | Dead domain: myfrogbag |
67 | No | 404s: photobucket |
76 | No | 404s: photobucket |
77 | No | 404s: imageshack |
78 | No | 404s: imageshack |
81 | No | Dead domain: TBD |
87 | No | 404s; Pulling HTML pages |
89 | Yes | 404s: imageshack |
93 | No | Dead domain: TBD |
104 | No | Dead domain: suspended webhost |
106 | No | Dead domain: TBD |
108 | No | 404s: imagebin |
109 | No | 404s; Dead domain |
111 | No | 404s: imageshack |
135 | No | 404s: photobucket |
158 | No | 404s |
160 | Yes | 404s: imageshack |
227 | No | uses blob: urls?? |
241 | No | Dropbox public folder |
263 | No | HTML: imageshack homepage |
270 | No | Dropbox public folder |
277 | No | Dead domain: |
285 | No | HTML |
307 | No | 404s |
308 | No | 404s |
314 | No | Dead domain: blacktourney.com |
319 | No | Dropbox public folder |
323 | No | i.minus.com |
325 | No | Dead domain: majhost.com |
331 | No | Dropbox public folder |
339 | Mostly | A Single Photobucket BWE |
341 | No | Dropbox public folder |
350 | No | Dead domain: majhost.com |
351 | Some | Imageshack 404s |
352 | Mostly | SWFs hosted at files.myfrogbag.com |
353 | Mostly | SWFs hosted at files.myfrogbag.com |
728 | No | Dropbox public folder |