Difference between revisions of "MS Paint Fan Adventures"

From Archiveteam
Jump to navigation Jump to search
(add instructions for running, add more status entries)
Line 6: Line 6:
  
 
Contact @riking to get upload permissions for the Archive collection
 
Contact @riking to get upload permissions for the Archive collection
 +
 +
== Operating the custom archiver ==
 +
 +
=== Prep ===
 +
 +
# Clone into $GOPATH - there will probably be updates to the script
 +
# Get wpull: <code>pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt</code>
 +
# Symlink it into the current directory: <code>ln -s $(which wpull)</code>
 +
# Get youtube-dl, <code>youtube-dl -U</code> as needed
 +
# If needed, symlink ./target to a bigger drive or specify <code>-o=folder</code> every time you run it
 +
# Build the code (TODO: patches to datatogether/warc) <code>go build -v .</code>
 +
# Put IAS3 credentials into ./ias3.json - <code>{"AccessKeyID": "...", "SecretAccessKey": "..."}</code>
 +
 +
=== Running ===
 +
 +
While testing, make sure to include both <code>-test -ident MSPFA_Test_12345</code> so that the uploaded Archive items go into test_collection.
 +
When testing changes to the script, use <code>-devScript</code>
 +
 +
Basic usage: <code>./mspfa-archiver -dl -ident auto -s 1234</code>
 +
 +
If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the <code>-fu</code> ("F"orce "U"pload) flag.
 +
 +
If you encounter a dead domain and don't want to wait for wpull, include <code>-wpullArgs '--exclude-domains majhost.com,g0m.yore.ma'</code> etc etc.
 +
 +
TODO - Script to automatically run on each story ID and save a list of failures
  
 
== Work Division ==
 
== Work Division ==
Line 21: Line 46:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Story ID !! Status !! Problems
+
! Story ID !! Archived OK? !! Problems
 
|-
 
|-
 
| 1 || {{yes}} || Single failed Photobucket URL.
 
| 1 || {{yes}} || Single failed Photobucket URL.
Line 33: Line 58:
 
| 17 || {{no}} || Dead domain: forum-files2.fobby.net
 
| 17 || {{no}} || Dead domain: forum-files2.fobby.net
 
|-
 
|-
| 19 || {{maybe|Mostly OK}} || broken URL: <code>http://i103.photobucket.com/albums/m160/Nupanick/MSPA%201/058-060Wave-arm-inside-DITH.gifhttp://i103.photobucket.com/albums/m160/Nupanick/MSPA%201/058-060Wave-arm-inside-DITH.gif</code>
+
| 19 || {{maybe|Mostly}} || broken URL: pasted twice in a row
 
|-
 
|-
| 21 || {{no}} || 404s
+
| 21 || {{no}} Contact Author || "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com
 
|-
 
|-
| 22 || {{no}} || broken URL
+
| 22 || {{maybe|Mostly}} || broken URL: pasted twice in a row
 
|-
 
|-
 
| 24 || {{no}} || Photobucket
 
| 24 || {{no}} || Photobucket
Line 125: Line 150:
 
| 331 || {{no}} || Dropbox public folder
 
| 331 || {{no}} || Dropbox public folder
 
|-
 
|-
| 339 || {{maybe|Mostly OK}} || A Single Photobucket BWE
+
| 339 || {{maybe|Mostly}} || A Single Photobucket BWE
 
|-
 
|-
 
| 341 || {{no}} || Dropbox public folder
 
| 341 || {{no}} || Dropbox public folder
 +
|-
 +
| 350 || {{no}} || Dead domain: majhost.com
 +
|-
 +
| 351 || {{maybe|Some}} || Imageshack 404s
 +
|-
 +
| 352 || {{maybe|Mostly}} || SWFs hosted at files.myfrogbag.com
 +
|-
 +
| 353 || {{maybe|Mostly}} || SWFs hosted at files.myfrogbag.com
 
|-
 
|-
 
| 728 || {{no}} || Dropbox public folder
 
| 728 || {{no}} || Dropbox public folder

Revision as of 17:17, 6 March 2018

Stub to write down notes about project in progress

Archiving the contents of https://mspfa.com/ - uses a JS app to read text-based stories with embedded images and Flash content. Often link to YouTube alternates for flashes.

Custom archiver code is at https://github.com/riking/mspfa-archiver . Giant tangled mess of Go scripts.

Contact @riking to get upload permissions for the Archive collection

Operating the custom archiver

Prep

  1. Clone into $GOPATH - there will probably be updates to the script
  2. Get wpull: pip3 install wpull==1.2.3; pip3 install -r https://github.com/chfoo/wpull/blob/v1.2.3/requirements.txt
  3. Symlink it into the current directory: ln -s $(which wpull)
  4. Get youtube-dl, youtube-dl -U as needed
  5. If needed, symlink ./target to a bigger drive or specify -o=folder every time you run it
  6. Build the code (TODO: patches to datatogether/warc) go build -v .
  7. Put IAS3 credentials into ./ias3.json - {"AccessKeyID": "...", "SecretAccessKey": "..."}

Running

While testing, make sure to include both -test -ident MSPFA_Test_12345 so that the uploaded Archive items go into test_collection. When testing changes to the script, use -devScript

Basic usage: ./mspfa-archiver -dl -ident auto -s 1234

If a download step fails (e.g. broken URL passed to photobucket step) run the archiver again with the -fu ("F"orce "U"pload) flag.

If you encounter a dead domain and don't want to wait for wpull, include -wpullArgs '--exclude-domains majhost.com,g0m.yore.ma' etc etc.

TODO - Script to automatically run on each story ID and save a list of failures

Work Division

1-999

Archiver: riking

Stage: Initial archiving

Problematic story IDs:

(note: this table should not be taken as an example, it's translated from my bad notes)

Story ID Archived OK? Problems
1 Yes Single failed Photobucket URL.
4 Yes Photobucket
12 Yes Photobucket
14 Yes 404s
17 No Dead domain: forum-files2.fobby.net
19 Mostly broken URL: pasted twice in a row
21 No Contact Author "410 Gone" from thefelt.webs.com; Dead domain: windowchronicles.com
22 Mostly broken URL: pasted twice in a row
24 No Photobucket
25 No 404s
26 No 404s: imageshack
33 No Dead domain: yore.ma
35 No broken URL
45 No 404s
46 No Dead domain: myfrogbag
48 No 404s: imageshack
53 No Dead domain: ardekantur.com
55 No 404s: imageshack
59 No Photobucket
61 No 404s
63 No 404s
66 No Dead domain: myfrogbag
67 No 404s: photobucket
76 No 404s: photobucket
77 No 404s: imageshack
78 No 404s: imageshack
81 No Dead domain: TBD
87 No 404s; Pulling HTML pages
89 Yes 404s: imageshack
93 No Dead domain: TBD
104 No Dead domain: suspended webhost
106 No Dead domain: TBD
108 No 404s: imagebin
109 No 404s; Dead domain
111 No 404s: imageshack
135 No 404s: photobucket
158 No 404s
160 Yes 404s: imageshack
227 No uses blob: urls??
241 No Dropbox public folder
263 No HTML: imageshack homepage
270 No Dropbox public folder
277 No Dead domain:
285 No HTML
307 No 404s
308 No 404s
314 No Dead domain: blacktourney.com
319 No Dropbox public folder
323 No i.minus.com
325 No Dead domain: majhost.com
331 No Dropbox public folder
339 Mostly A Single Photobucket BWE
341 No Dropbox public folder
350 No Dead domain: majhost.com
351 Some Imageshack 404s
352 Mostly SWFs hosted at files.myfrogbag.com
353 Mostly SWFs hosted at files.myfrogbag.com
728 No Dropbox public folder

1000-1999

2000-2999

3000-3999