Difference between revisions of "Template:CTA URL lists"
Jump to navigation
Jump to search
Switchnode (talk | contribs) (move category link to hat position; tighten up prose) |
Switchnode (talk | contribs) m (additional fixes) |
||
Line 8: | Line 8: | ||
#* If using grep or similar, enable text matching (<code>-a</code> or <code>--text</code>) to catch URLs in files with apparent binary data. | #* If using grep or similar, enable text matching (<code>-a</code> or <code>--text</code>) to catch URLs in files with apparent binary data. | ||
#* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}} | #* Example command (GNU grep): <code>grep -Pahoi '{{{regex}}}' FILENAME FILENAME...</code>}} | ||
# If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, | # If the {{ #if: {{{regex|}}} | output | list }} exceeds a few megabytes, compress it, preferably using <code>zstd -10</code>. | ||
# | # Give the file a descriptive name and upload it to https://transfer.archivete.am/. | ||
# Share the resulting URL in the project IRC channel. | # Share the resulting URL in the project IRC channel. | ||
#* If you wish your list to remain private, please get in touch with a channel op (e.g. [[User:Arkiver|arkiver]] or [[User:JustAnotherArchivist|JustAnotherArchivist]]). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.{{ #if: {{{suppresscategory|}}} ||[[Category:Projects requiring URL lists]]}}</includeonly><noinclude> | #* If you wish your list to remain private, please get in touch with a channel op (e.g. [[User:Arkiver|arkiver]] or [[User:JustAnotherArchivist|JustAnotherArchivist]]). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.{{ #if: {{{suppresscategory|}}} ||[[Category:Projects requiring URL lists]]}}</includeonly><noinclude> |
Latest revision as of 22:28, 4 January 2024
Options:
regex
, required, the PCRE regular expression to use for filtering, will get wrapped in single quotes for the grep command- Technically, this isn't actually required, but only for use on URLs.
broad
, optional, adding an extra bit about the regex being intentionally broad if non-empty
Example:
{{CTA URL lists|regex = <nowiki>\S*(foo|bar)\S*</nowiki>|broad = yes}}
renders as:
How to help if you have lists of URLs
- For other ArchiveTeam projects that can use this kind of help, see Projects requiring URL lists.
This project requires lists of URLs for content on the target website. If you have a source of URLs, please:
- Use the PCRE regular expression
\S*(foo|bar)\S*
for filtering.- Note that this regex is intentionally broad to cover many different URL formats. Please do not try to use a more narrow pattern, as it may miss valid URLs. We can always filter or transform the results as needed later.
- Enable case-insensitive matching (e.g. grep's
-i
) to catch URLs with capitalization. - If using grep or similar, enable text matching (
-a
or--text
) to catch URLs in files with apparent binary data. - Example command (GNU grep):
grep -Pahoi '\S*(foo|bar)\S*' FILENAME FILENAME...
- If the output exceeds a few megabytes, compress it, preferably using
zstd -10
. - Give the file a descriptive name and upload it to https://transfer.archivete.am/.
- Share the resulting URL in the project IRC channel.
- If you wish your list to remain private, please get in touch with a channel op (e.g. arkiver or JustAnotherArchivist). Items generated from your list will still be processed publicly, but they will be mixed in with all other items and channel logs will not associate them with you.