Difference between revisions of "Software"
Jump to navigation
Jump to search
m (→WARC Tools: refer to other WARC tools) |
(add link to Mobile_Phone_Applications, so it is findable) |
||
Line 15: | Line 15: | ||
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places | * [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places | ||
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]). | * [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]). | ||
* [[Mobile Phone Applications]] -- some notes on preserving old versions of mobile apps | |||
== Hosted tools == | == Hosted tools == |
Revision as of 22:45, 9 August 2015
WARC Tools
The WARC Ecosystem includes information on wget, Heritrix and a lot of little but handy tools to create, read and process WARC files.
General Tools
- GNU WGET
- Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
- cURL
- HTTrack - HTTrack options
- Pavuk -- a bit flaky, but very flexible
- http://warrick.cs.odu.edu/warrick.html
- Beautiful Soup - Python library for web scraping
- Scrapy - Fast python library for web scraping
- Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
- WiLiSe WikiLink Search - Python script to get links to specific pages of a site through the search in a Wiki (MediaWiki-type) has the api.php accessible or extension LinkSearch enabled (the project is still very immature and at the moment the code is only available in this SVN repository).
- Mobile Phone Applications -- some notes on preserving old versions of mobile apps
Hosted tools
Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.
Site-Specific
- Livejournal
- SomaFM
- http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.
Format Specific
Web scraping
- See Site exploration
← Why Back Up? • Software • Formats →