Difference between revisions of "Template:CTA URL lists"

Latest revision as of 22:28, 4 January 2024

Options:

regex, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command
- Technically, this isn't actually required, but only for use on URLs.
broad, optional, adding an extra bit about the regex being intentionally broad if non-empty

Example:

{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}

renders as:

For other ArchiveTeam projects that can use this kind of help, see Projects requiring URL lists.

This project requires lists of URLs for content on the target website. If you have a source of URLs, please:

Use the PCRE regular expression \S*(foo|bar)\S* for filtering.
- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern, as it may miss valid URLs. We can always filter or transform the results as needed later.
- Enable case-insensitive matching (e.g. grep's -i) to catch URLs with capitalization.
- If using grep or similar, enable text matching (-a or --text) to catch URLs in files with apparent binary data.
- Example command (GNU grep): grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
If the output exceeds a few megabytes, compress it, preferably using zstd -10.
Give the file a descriptive name and upload it to https://transfer.archivete.am/.
Share the resulting URL in the project IRC channel.
- If you wish your list to remain private, please get in touch with a channel op (e.g. arkiver or JustAnotherArchivist). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.

@@ Line 8: / Line 8: @@
 #* If using grep or similar, enable text matching (<code>-a</code> or <code>--text</code>) to catch URLs in files with apparent binary data.
 #* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}}
-# If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, please compress it, preferably using <code>zstd -10</code>.
+# If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, compress it, preferably using <code>zstd -10</code>.
-# Upload the file to https://transfer.archivete.am/.
+# Give the file a descriptive name and upload it to https://transfer.archivete.am/.
 # Share the resulting URL in the project IRC channel.
 #* If you wish your list to remain private, please get in touch with a channel op (e.g. [[User:Arkiver|arkiver]] or [[User:JustAnotherArchivist|JustAnotherArchivist]]). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.{{ #if: {{{suppresscategory|}}} ||[[Category:Projects requiring URL lists]]}}</includeonly><noinclude>