HTTrack options
Good options to use for httrack to mirror a large-ish site.
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
MOTHERFUCKER ! ! !
A rundown of the previous options
--connection-per-second=50
: This allows for up to 50 connections per second.--sockets=80
: Opens up to 80 sockets. If this gives you errors, lower this to 48.--disable-security-limits
,-A100000000
: By default, HTTrack attempts to play nicely with webservers, and tries not to overload them by limiting the download speed to 25kbps. On text-based sites this is normally good, but it becomes a hassle when the site is image-heavy. The first option disables the forced limit and the second one raises the limit to a large amount.-s0
: Tells HTTrack to disobey robots.txt.-F
: Sets the user agent.-#L500000000
: Raises the maximum amount of links HTTrack fetches to 500M. Raise if needed.-n
: This gets all nearby files (all files shown on a page), rather than only those on the domain name, which is HTTracks default behavior.
Other options
- A full rundown of all possible options (including those on site structure) can be found here: https://www.httrack.com/html/fcguide.html .
NOTE: httrack runs java internally (I believe) and is limited to 2GB of ram. Not sure if a 64-bit version of it will allow for a larger crawl queue.