Difference between revisions of "Google Groups Files"

From Archiveteam
Jump to navigation Jump to search
Line 12: Line 12:


== Status ==
== Status ==
<font color="red">We are currently doing a <b>second pass</b> through the index to make absolutely sure we got everything (and grab new groups that have been founded in the meantime). But for now, we claim victory!</font>


<br>
<br>
'''2011-08-27:'''
directories: 577199
groups: 1526946
'''2011-06-04:'''
'''2011-06-04:'''



Revision as of 21:58, 27 August 2011

Googleparty.jpg

Google is challenging AT again...

This notice appears on Google Groups pages:


Zipped versions of the pages and files associated with this group will be available for download until August 31, 2011. After this date, this feature and the zip file downloads will be turned off permanently.


A script is available that searches Google Groups directories and downloads the ZIP files of individual groups. The script uses a Google App Engine hosted app for coordination.

Status

We are currently doing a second pass through the index to make absolutely sure we got everything (and grab new groups that have been founded in the meantime). But for now, we claim victory!


2011-08-27: directories: 577199 groups: 1526946

2011-06-04:

directories: new: 59926, done: 25585
groups: new 552075, done: 70804

2011-06-14:

directories: done: 76080
groups: done: 163749

2011-06-16:

directories: NEW: 80942, PROCESSING: 7, DONE_DIR: 81644
groups: NEW: 880230, PROCESSING: 97, ERROR: 5825, ADULT: 3893, DONE_GRP: 172148

2011-06-20:

directories: NEW: 71364, PROCESSING: 6, DONE_DIR: 101141
groups: NEW: 778080, PROCESSING: 290, ERROR: 10249, ADULT: 4177, DONE_GRP: 298122
completion rate: directories: 170/hr, groups: 2213/hr

2011-06-28:

directories: TOTAL: 243898, NEW: 105872, PROCESSING: 15, DONE_DIR: 138011
groups: TOTAL: 1245968, NEW: 767342, PROCESSING: 44, ERROR: 10944, ADULT: 4236, DONE_GRP: 463402
completion rate: directories: 337/hr, groups: 865/hr

2011-07-19

directories: TOTAL: 443013, NEW: 159064, DONE: 270836
groups: TOTAL: 1505795, NEW: 558656, DONE + ERROR + ADULT: 947139
completion rate: directories: 231/hr, groups: 1727/hr
new discovery rate: dirs: 0/hr, grps: 2/hr

Script

Requirements

(ba)sh, wget, grep, curl

Usage

  • Normal operation
./ggroups_zipdl.sh
  • Discover only (no downloads to store)
./ggroups_zipdl.sh discover
  • Download only (no discovery of new groups)
./ggroups_zipdl.sh download

Issues

-