Difference between revisions of "Software"
Jump to navigation
Jump to search
Line 9: | Line 9: | ||
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible | * [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible | ||
* http://warrick.cs.odu.edu/warrick.html | * http://warrick.cs.odu.edu/warrick.html | ||
* [http://www.crummy.com/software/BeautifulSoup/ | * [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping | ||
* [http://scrapy.org/] | * [http://scrapy.org/ Scrapy] - Fast python library for web scraping | ||
* [http://splinter.cobrateam.info/] | * [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places | ||
== Hosted tools == | == Hosted tools == |
Revision as of 03:28, 17 May 2011
General Tools
- GNU WGET
- Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"
- cURL
- HTTrack - HTTrack options
- Heritrix -- what archive.org use
- Pavuk -- a bit flaky, but very flexible
- http://warrick.cs.odu.edu/warrick.html
- Beautiful Soup - Python library for web scraping
- Scrapy - Fast python library for web scraping
- Splinter - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places
Hosted tools
Pinboard is a convenient social bookmarking service that will archive copies of all your bookmarks for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.