Difference between revisions of "Template:CTA URL lists"

Revision as of 02:04, 23 April 2023

Options:

regex, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command
broad, optional, adding an extra bit about the regex being intentionally broad if non-empty

Example:

{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}

renders as:

This project requires lists of URLs for content on the target website. If you have a source of URLs, please:

Use the regular expression \S*(foo|bar)\S* for filtering.
- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.
- If you use grep, remember to include the -a (aka --text on GNU grep) option to ensure it will continue searching for matches when encountering binary data.
- Example command (GNU grep): grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
If the output exceeds a few megabytes, please compress it, preferably using zstd -10.
Upload the file to https://transfer.archivete.am/.
Share the resulting URL in the project IRC channel.
- If you would like to keep the list non-public instead, e.g. for privacy reasons or for not wanting to be publicly associated with it, please get in touch with a channel op (e.g. User:Arkiver or User:JustAnotherArchivist). Note that the items generated from your list would still be processed publicly, of course, but they would be mixed with everything else.

See also Category:Projects requiring URL lists for other ArchiveTeam projects that necessitate URL lists.

@@ Line 5: / Line 5: @@
 #* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.}}
 #* If you use <code>grep</code>, remember to include the <code>-a</code> (aka <code>--text</code> on GNU grep) option to ensure it will continue searching for matches when encountering binary data.
+#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>
 # If the output exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>.
 # Upload the file to https://transfer.archivete.am/.
@@ Line 13: / Line 14: @@
 Options:
-* <code>regex</code>, required, the PCRE-ish regular expression to use for filtering
+* <code>regex</code>, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command
 * <code>broad</code>, optional, adding an extra bit about the regex being intentionally broad if non-empty