Difference between revisions of "Wget with Lua hooks"
Jump to navigation
Jump to search
(more examples on github repos) |
(note it's on the lua branch) |
||
Line 1: | Line 1: | ||
New idea: add Lua scripting to wget. | * New idea: add Lua scripting to wget. | ||
Work in progress: https://github.com/alard/wget-lua | * Work in progress: https://github.com/alard/wget-lua/tree/lua | ||
** The Lua scripting is patched on the "lua" branch. You can use the compare branch feature on GitHub to see the differences. | |||
** Alternative location: https://github.com/ArchiveTeam/wget-lua/tree/lua. | |||
Documentation: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks | * Documentation: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks | ||
Example usage: | |||
<pre> | <pre> | ||
wget http://www.archiveteam.org/ -r --lua-script=lua-example/print_parameters.lua | wget http://www.archiveteam.org/ -r --lua-script=lua-example/print_parameters.lua |
Revision as of 18:40, 12 October 2013
- New idea: add Lua scripting to wget.
- Work in progress: https://github.com/alard/wget-lua/tree/lua
- The Lua scripting is patched on the "lua" branch. You can use the compare branch feature on GitHub to see the differences.
- Alternative location: https://github.com/ArchiveTeam/wget-lua/tree/lua.
- Documentation: https://github.com/alard/wget-lua/wiki/Wget-with-Lua-hooks
Example usage:
wget http://www.archiveteam.org/ -r --lua-script=lua-example/print_parameters.lua
Why would this be useful?
Custom error handling
What to do in case of an error? Sometimes you want wget to retry the url if it gets a server error.
wget.callbacks.httploop_result = function(url, err, http_stat) if http_stat.statcode == 500 then -- try again return wget.actions.CONTINUE elseif http_statcode == 404 then -- stop return wget.actions.EXIT else -- let wget decide return wget.actions.NOTHING end end
Custom decide rules
Download this url or not?
wget.callbacks.download_child_p = function(urlpos, parent, depth, start_url_parsed, iri, verdict) if string.find(urlpos.url, "textfiles.com") then -- always download return true elseif string.find(urlpos.url, "archive.org") then -- never! return false else -- follow wget's advice return verdict end end
Custom url extraction/generation
Sometimes it's useful if you can write your own url extraction code, for example to add urls that aren't actually on the page.
wget.callbacks.get_urls = function(file, url, is_css, iri) if string.find(url, ".com/profile/[^/]+/$") then -- make sure wget downloads the user's photo page -- and custom profile photo return { { url=url.."photo.html", link_expect_html=1, link_expect_css=0 }, { url=url.."photo.jpg", link_expect_html=0, link_expect_css=0 } } else -- no new urls to add return {} end end
More Examples
Archive Team has real life scripts on the Archive Team GitHub organization. Look for recent -grab
projects. The Lua scripts range from simple checks to complex URL scraping.