Difference between revisions of "Software"
Jump to navigation
Jump to search
m (add link to Site exploration) |
m (add pager) |
||
Line 35: | Line 35: | ||
* See [[Site exploration]] | * See [[Site exploration]] | ||
{{Navigation pager | |||
| previous = Why Back Up? | |||
| next = Formats | |||
}} | |||
{{Navigation box}} | {{Navigation box}} | ||
[[Category:Tools| ]] | [[Category:Tools| ]] |
Revision as of 01:39, 20 October 2013
WARC Tools
The WARC Ecosystem includes information on wget, Heritrix
General Tools
- GNU WGET
- Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
- cURL
- HTTrack - HTTrack options
- Pavuk -- a bit flaky, but very flexible
- http://warrick.cs.odu.edu/warrick.html
- Beautiful Soup - Python library for web scraping
- Scrapy - Fast python library for web scraping
- Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
- WiLiSe WikiLink Search - Python script to get links to specific pages of a site through the search in a Wiki (MediaWiki-type) has the api.php accessible or extension LinkSearch enabled (the project is still very immature and at the moment the code is only available in this SVN repository).
Hosted tools
Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.
Site-Specific
- Livejournal
- SomaFM
- http://www.allmytweets.net/ - Download the last 3,200 tweets from any user.
Format Specific
Web scraping
- See Site exploration
← Why Back Up? • Software • Formats →