Difference between revisions of "Template:CTA URL lists"

Revision as of 02:07, 23 April 2023

Options:

regex, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command
- Technically, this isn't actually required, but only for use on URLs.
broad, optional, adding an extra bit about the regex being intentionally broad if non-empty

Example:

{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}

renders as:

How to help if you have lists of URLs

This project requires lists of URLs for content on the target website. If you have a source of URLs, please:

Use the regular expression \S*(foo|bar)\S* for filtering.
- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.
- If you use grep, remember to include the -a (aka --text on GNU grep) option to ensure it will continue searching for matches when encountering binary data.
- Example command (GNU grep): grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
If the output exceeds a few megabytes, please compress it, preferably using zstd -10.
Upload the file to https://transfer.archivete.am/.
Share the resulting URL in the project IRC channel.
- If you would like to keep the list non-public instead, e.g. for privacy reasons or for not wanting to be publicly associated with it, please get in touch with a channel op (e.g. User:Arkiver or User:JustAnotherArchivist). Note that the items generated from your list would still be processed publicly, of course, but they would be mixed with everything else.

See also Category:Projects requiring URL lists for other ArchiveTeam projects that necessitate URL lists.

@@ Line 1: / Line 1: @@
 <includeonly>== How to help if you have lists of URLs ==
 This project requires lists of URLs for content on the target website. If you have a source of URLs, please:
+{{ #if: {{{regex|}}} |
 # Use the regular expression <code>{{{regex}}}</code> for filtering.{{ #if: {{{broad|}}} |
 #* Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern as it may miss valid URLs. We can always filter or transform the results as needed later.}}
 #* If you use <code>grep</code>, remember to include the <code>-a</code> (aka <code>--text</code> on GNU grep) option to ensure it will continue searching for matches when encountering binary data.
-#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>
+#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}}
-# If the output exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>.
+# If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>.
 # Upload the file to https://transfer.archivete.am/.
 # Share the resulting URL in the project IRC channel.
@@ Line 15: / Line 15: @@
 * <code>regex</code>, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command
+** Technically, this isn't actually required, but only for use on [[URLs]].
 * <code>broad</code>, optional, adding an extra bit about the regex being intentionally broad if non-empty

Difference between revisions of "Template:CTA URL lists"

Revision as of 02:07, 23 April 2023

How to help if you have lists of URLs

Navigation menu

Search