Difference between revisions of "Template:CTA URL lists"
Jump to navigation
Jump to search
(Add grep -a option) |
(Add grep command example) |
||
Line 5: | Line 5: | ||
#* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.}} | #* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.}} | ||
#* If you use <code>grep</code>, remember to include the <code>-a</code> (aka <code>--text</code> on GNU grep) option to ensure it will continue searching for matches when encountering binary data. | #* If you use <code>grep</code>, remember to include the <code>-a</code> (aka <code>--text</code> on GNU grep) option to ensure it will continue searching for matches when encountering binary data. | ||
#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code> | |||
# If the output exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>. | # If the output exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>. | ||
# Upload the file to https://transfer.archivete.am/. | # Upload the file to https://transfer.archivete.am/. | ||
Line 13: | Line 14: | ||
Options: | Options: | ||
* <code>regex</code>, required, the PCRE | * <code>regex</code>, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command | ||
* <code>broad</code>, optional, adding an extra bit about the regex being intentionally broad if non-empty | * <code>broad</code>, optional, adding an extra bit about the regex being intentionally broad if non-empty | ||
Revision as of 02:04, 23 April 2023
Options:
regex
, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep commandbroad
, optional, adding an extra bit about the regex being intentionally broad if non-empty
Example:
{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}
renders as:
How to help if you have lists of URLs
This project requires lists of URLs for content on the target website. If you have a source of URLs, please:
- Use the regular expression
\S*(foo|bar)\S*
for filtering.- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.
- If you use
grep
, remember to include the-a
(aka--text
on GNU grep) option to ensure it will continue searching for matches when encountering binary data. - Example command (GNU grep):
grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
- If the output exceeds a few megabytes, please compress it, preferably using
zstd -10
. - Upload the file to https://transfer.archivete.am/.
- Share the resulting URL in the project IRC channel.
- If you would like to keep the list non-public instead, e.g. for privacy reasons or for not wanting to be publicly associated with it, please get in touch with a channel op (e.g. User:Arkiver or User:JustAnotherArchivist). Note that the items generated from your list would still be processed publicly, of course, but they would be mixed with everything else.
See also Category:Projects requiring URL lists for other ArchiveTeam projects that necessitate URL lists.