WordPress

From Archiveteam
Jump to navigation Jump to search
This page is about the open-source software (WordPress, WordPress.org). For the commercial blog host, see WordPress.com.

WordPress is a PHP based content management system, most notably known for the blogs it usually gets used on.

Ignore Patterns

From WordPress/Ignore Patterns (edit)
If you want to archive a WordPress blog in ArchiveBot, you should add the ignore set "blogs", and possibly ignore:

  • ^https?://SITE/wp-(content|includes)/(.*/)?($|\?) and ^https?://SITE/wp-(content|includes)/.*\.php$ if they cause problems detected as open directory.
  • xmlrpc.php if it causes bans.
  • ^https?://{primary_netloc}/.*/(udata\.vst|current\.cmp|current\.src|current_add\.ep|gtm\.js)/?$ since some plugins generate junk.

Useful pages

Some WordPress sites have their asset subdirectories open, which at times can contain files unreferenced throughout the main site. For a more complete grab, it can help to look for these pages, and if they're open, they can saved pretty easily.

Here is a summary of the interesting pages from WordPress' file structure.[1]

  • http://example.com/wp-content/: Typically contains images, videos, and other files that are used throughout a WordPress-powered Web site. This page may be entirely blank.
    • http://example.com/wp-content/uploads/: This directory can contain the actual aforementioned files, usually broken down into a YYYY/MM/ folder structure.
    • http://example.com/wp-content/plugins/: This directory can contain uploaded plugin files used throughout the site.
  • http://example.com/wp-includes/: Typically contains PHP, CSS, and JavaScript files that are used for more-advanced site formatting and actions.
  • http://example.com/wp-admin/: This doesn't appear to be an open directory, and is almost always blocked, but can still have some useful outlinks and is another step in achieving thorough archival.

Archiving tools

References