WordPress
Jump to navigation
Jump to search
- This page is about the open-source software (WordPress, WordPress.org). For the commercial blog host, see WordPress.com.
WordPress is a PHP based content management system, most notably known for the blogs it usually gets used on.
Ignore Patterns
From WordPress/Ignore Patterns (edit)
If you want to archive a WordPress blog in ArchiveBot, you should add the ignore set "blogs", and possibly ignore:
- ^https?://SITE/wp-(content|includes)/(.*/)?($|\?) and ^https?://SITE/wp-(content|includes)/.*\.php$ if they cause problems detected as open directory.
- xmlrpc.php if it causes bans.
- ^https?://{primary_netloc}/.*/(udata\.vst|current\.cmp|current\.src|current_add\.ep|gtm\.js)/?$ since some plugins generate junk.
Useful pages
Some WordPress sites have their asset subdirectories open, which at times can contain files unreferenced throughout the main site. For a more complete grab, it can help to look for these pages, and if they're open, they can saved pretty easily.
Here is a summary of the interesting pages from WordPress' file structure.[1]
- http://example.com/wp-content/: Typically contains images, videos, and other files that are used throughout a WordPress-powered Web site. This page may be entirely blank.
- http://example.com/wp-content/uploads/: This directory can contain the actual aforementioned files, usually broken down into a YYYY/MM/ folder structure.
- http://example.com/wp-content/plugins/: This directory can contain uploaded plugin files used throughout the site.
- http://example.com/wp-includes/: Typically contains PHP, CSS, and JavaScript files that are used for more-advanced site formatting and actions.
- http://example.com/wp-admin/: This doesn't appear to be an open directory, and is almost always blocked, but can still have some useful outlinks and is another step in achieving thorough archival.