Difference between revisions of "User:Vitzli"
Jump to navigation
Jump to search
(Add user page) |
(Add IA.BAK preliminary prospecting report) |
||
(One intermediate revision by the same user not shown) | |||
Line 5: | Line 5: | ||
# [https://archive.org/details/poleshift-survival-library Pole shift survival library] — hasn't been updated since 2013, was quite popular among survival/prepping folks, not endangered as website is still online, but torrent is decaying. | # [https://archive.org/details/poleshift-survival-library Pole shift survival library] — hasn't been updated since 2013, was quite popular among survival/prepping folks, not endangered as website is still online, but torrent is decaying. | ||
# [https://archive.org/details/amazon-reviews-1995-2013 Amazon reviews webdata 1995-2013] — still available, but links were hidden. | # [https://archive.org/details/amazon-reviews-1995-2013 Amazon reviews webdata 1995-2013] — still available, but links were hidden. | ||
# CGP Grey youtube channel, tar archive per year: [https://archive.org/details/CGPGrey-tar-2010 2010],[https://archive.org/details/CGPGrey-tar-2011 2011], [https://archive.org/details/CGPGrey-tar-2012 2012], [https://archive.org/details/CGPGrey-tar-2013 2013], [https://archive.org/details/CGPGrey-tar-2014 2014], [https://archive.org/details/CGPGrey-tar-2015 2015] | |||
# SmarterEveryDay youtube channel, tar archive per year: [https://archive.org/details/SmarterEveryDay-tar-2007 2007], [https://archive.org/details/SmarterEveryDay-tar-2008 2008], [https://archive.org/details/SmarterEveryDay-tar-2009 2009], [https://archive.org/details/SmarterEveryDay-tar-2010 2010], [https://archive.org/details/SmarterEveryDay-tar-2011 2011], [https://archive.org/details/SmarterEveryDay-tar-2012 2012], [https://archive.org/details/SmarterEveryDay-tar-2013 2013], [https://archive.org/details/SmarterEveryDay-tar-2014 2014], [https://archive.org/details/SmarterEveryDay-tar-2015 2015] | |||
== Prospecting IA.BAK collections == | |||
Tools required: Python 3 libraries/modules - internetarchive, ia-mine; jq - json processing; parallel - run multiple programs in ''for each'' fashion. | |||
archive.org account required (S3 keys) for ia-mine and internetarchive (ia) tools | |||
=== 2016-02-03 census === | |||
* 10 shards | |||
* 79 collections | |||
* 142462 items total, 106054 unique items (my mistake, do uniq before doing large batch) | |||
=== jq code === | |||
Remove 'collection' items: | |||
<code> | |||
parallel --jobs 4 'jq '"'"'. | select(.mediatype != "collection") | .identifier'"'"' '"$F_PREFIX"'/{}.col.json | tr -d '"'"'"'"'" | |||
' > '"$F_PREFIX"'/{}.items.json' | |||
</code> | |||
Remove 'uploader' field: | |||
<code> | |||
parallel --jobs 4 'jq -c '"'"'del(.metadata.uploader)'"'"' '"$F_PREFIX"'/{}.mined.json > '"SHARDS-20160203-cleaned/$F_PREFIX"'/{}.cleaned.json' | |||
</code> |
Latest revision as of 15:19, 3 February 2016
Saved stuff
- JBG Travels youtube channel, partial download, 847 videos total: part 1, part 2, part 3.
Several videos were either marked private or removed at the request of his employer, although they contained only road video. - Encyclopedia Astronautica snapshot (2015-10-22) according to Alive... OR ARE THEY - is on the watchlist
- Pole shift survival library — hasn't been updated since 2013, was quite popular among survival/prepping folks, not endangered as website is still online, but torrent is decaying.
- Amazon reviews webdata 1995-2013 — still available, but links were hidden.
- CGP Grey youtube channel, tar archive per year: 2010,2011, 2012, 2013, 2014, 2015
- SmarterEveryDay youtube channel, tar archive per year: 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015
Prospecting IA.BAK collections
Tools required: Python 3 libraries/modules - internetarchive, ia-mine; jq - json processing; parallel - run multiple programs in for each fashion.
archive.org account required (S3 keys) for ia-mine and internetarchive (ia) tools
2016-02-03 census
- 10 shards
- 79 collections
- 142462 items total, 106054 unique items (my mistake, do uniq before doing large batch)
jq code
Remove 'collection' items:
parallel --jobs 4 'jq '"'"'. | select(.mediatype != "collection") | .identifier'"'"' '"$F_PREFIX"'/{}.col.json | tr -d '"'"'"'"'"
' > '"$F_PREFIX"'/{}.items.json'
Remove 'uploader' field:
parallel --jobs 4 'jq -c '"'"'del(.metadata.uploader)'"'"' '"$F_PREFIX"'/{}.mined.json > '"SHARDS-20160203-cleaned/$F_PREFIX"'/{}.cleaned.json'