User:Vitzli
Jump to navigation
Jump to search
Saved stuff
- JBG Travels youtube channel, partial download, 847 videos total: part 1, part 2, part 3.
Several videos were either marked private or removed at the request of his employer, although they contained only road video. - Encyclopedia Astronautica snapshot (2015-10-22) according to Alive... OR ARE THEY - is on the watchlist
- Pole shift survival library — hasn't been updated since 2013, was quite popular among survival/prepping folks, not endangered as website is still online, but torrent is decaying.
- Amazon reviews webdata 1995-2013 — still available, but links were hidden.
- CGP Grey youtube channel, tar archive per year: 2010,2011, 2012, 2013, 2014, 2015
- SmarterEveryDay youtube channel, tar archive per year: 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
Prospecting IA.BAK collections
Tools required: Python 3 libraries/modules - internetarchive, ia-mine; jq - json processing; parallel - run multiple programs in for each fashion.
archive.org account required (S3 keys) for ia-mine and internetarchive (ia) tools
2016-02-03 census
- 10 shards
- 79 collections
- 142462 items total, 106054 unique items (my mistake, do uniq before doing large batch)
jq code
Remove 'collection' items:
parallel --jobs 4 'jq '"'"'. | select(.mediatype != "collection") | .identifier'"'"' '"$F_PREFIX"'/{}.col.json | tr -d '"'"'"'"'"
' > '"$F_PREFIX"'/{}.items.json'
Remove 'uploader' field:
parallel --jobs 4 'jq -c '"'"'del(.metadata.uploader)'"'"' '"$F_PREFIX"'/{}.mined.json > '"SHARDS-20160203-cleaned/$F_PREFIX"'/{}.cleaned.json'