Difference between revisions of "Patch.com"
Line 18: | Line 18: | ||
There's what looks like a master site map (with links to sub-sitemaps) at http://www.patch.com/sitemaps.xml also. | There's what looks like a master site map (with links to sub-sitemaps) at http://www.patch.com/sitemaps.xml also. | ||
Most/all(?) Patches seem to share similar directories and content structures - e.g. /news, /blogs, /boards, /events, /directory, /jobs, etc. | |||
=== Next steps === | === Next steps === |
Revision as of 22:05, 14 August 2013
Patch.com | |
Your neighborhood. Your news. | |
URL | http://www.patch.com/ |
Status | Closing |
Archiving status | In progress... |
Archiving type | Unknown |
Project source | https://github.com/ArchiveTeam/patch-grab |
Project tracker | here |
IRC channel | #cabbagepatch (on hackint) |
Patch.com is a "hyperlocal" news community which is being downsized from its current ~900 sites to ~500.
Current status
antomatic has prepared (what appears to be) a complete list of sites. A prototype seesaw project (no Warrior integration yet) also exists.
There's what looks like a master site map (with links to sub-sitemaps) at http://www.patch.com/sitemaps.xml also.
Most/all(?) Patches seem to share similar directories and content structures - e.g. /news, /blogs, /boards, /events, /directory, /jobs, etc.
Next steps
Patch subdomains are (1) big and (2) appear to implement some sort of request cap per IP per unit time. (You'll start getting HTTP 420s after a while.) We need to investigate whether we need to implement a complicated mechanism to split up individual sites and then megawarc them together, or just take each site slowly (e.g. n requests every hour).
Pop in the IRC channel if you want to help.