Difference between revisions of "HTTrack options"

From Archiveteam
Jump to navigation Jump to search
m (MOTHERFUCKER ! ! !)
m (MOTHERFUCKER ! ! !)
Line 1: Line 1:
Good options to use for [[httrack]] to mirror a large-ish site.
Good options to use for [[httrack]] to mirror a large-ish site.
== '''MOTHERFUCKER ! ! !''' ==
== '''MOTHERFUCKER ! ! !''' ==


== '''MOTHERFUCKER ! ! !''' ==
== '''MOTHERFUCKER ! ! !''' ==

Revision as of 11:42, 17 January 2017

Good options to use for httrack to mirror a large-ish site.

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

MOTHERFUCKER ! ! !

A rundown of the previous options

  • --connection-per-second=50: This allows for up to 50 connections per second.
  • --sockets=80: Opens up to 80 sockets. If this gives you errors, lower this to 48.
  • --disable-security-limits,-A100000000: By default, HTTrack attempts to play nicely with webservers, and tries not to overload them by limiting the download speed to 25kbps. On text-based sites this is normally good, but it becomes a hassle when the site is image-heavy. The first option disables the forced limit and the second one raises the limit to a large amount.
  • -s0: Tells HTTrack to disobey robots.txt.
  • -F: Sets the user agent.
  • -#L500000000: Raises the maximum amount of links HTTrack fetches to 500M. Raise if needed.
  • -n: This gets all nearby files (all files shown on a page), rather than only those on the domain name, which is HTTracks default behavior.

Other options


NOTE: httrack runs java internally (I believe) and is limited to 2GB of ram. Not sure if a 64-bit version of it will allow for a larger crawl queue.