https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Iniba&feedformat=atomArchiveteam - User contributions [en]2024-03-28T09:59:44ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Software&diff=47896Software2021-11-21T21:59:48Z<p>Iniba: Updated annual prices of Pinboard</p>
<hr />
<div>__NOTOC__<br />
== WARC Tools ==<br />
[[The WARC Ecosystem]] has information on tools to create, read and process WARC files.<br />
<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: <code>wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"</code><br />
* [https://curl.haxx.se/ cURL]<br />
* [https://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* [https://github.com/oduwsdl/warrick Warrick] - Tool to recover lost websites using various online archives and caches.<br />
* [https://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [https://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [https://github.com/JustAnotherArchivist/snscrape snscrape] - Tool to scrape social networking services.<br />
* [https://splinter.readthedocs.io/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [https://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps<br />
<br />
== Hosted tools ==<br />
* [https://pinboard.in/ Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $22/year, or $39/year if you want the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
* [https://freeyourstuff.cc/ freeyourstuff.cc] -- Extensible open-source ([https://github.com/eloquence/freeyourstuff.cc source]) Chrome plugin allowing users to export their own content (reviews, posts, etc.). Exports to JSON format, optionally publish to freeyourstuff.cc & mirrors under Creative Commons CC0 license. Supports Yelp, [[IMDB]], TripAdvisor, [[Amazon]], GoodReads, and [[Quora]] as of July 2019.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
* https://www.allmytweets.net/ - Download the last 3,200 tweets from any user.<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
* [https://youzim.it/ ZIM it] ([https://openzim.org/ ZIM format] for [https://www.kiwix.org Kiwix])<br />
<br />
== Proposed ==<br />
<br />
* [https://solidproject.org/ Solid project] attempts to make data portability a reality<br />
* [https://datatransferproject.dev/ Data transfer project] is a (promise of) a quick implementation of [[wikipedia:GDPR|GDPR]] data portability by the [[wikipedia:GAFA|GAFA]] + Twitter<br />
<br />
== Web scraping ==<br />
<br />
* See [[Site exploration]]<br />
<br />
{{Navigation pager<br />
| previous = Why Back Up?<br />
| next = Formats<br />
}}<br />
{{Navigation box}}<br />
<br />
[[Category:Tools| ]]</div>Iniba