Difference between revisions of "IRC Quotes"

From Archiveteam
Jump to navigation Jump to search
(ibash.de finished)
(Scraped 13 more QDBs, added info about a QdbS scraper)
Line 2: Line 2:
 
== What's this, then? ==
 
== What's this, then? ==
 
[[User:Auguste|Auguste]], [[User:BlueMax|BlueMax]] and [[User:Dr-spangle|Dr-Spangle]] are currently scraping IRC quote databases (e.g. [http://www.bash.org Bash.org]).  If you can help out or suggest other quote databases to scrape, please join them in #bashup.
 
[[User:Auguste|Auguste]], [[User:BlueMax|BlueMax]] and [[User:Dr-spangle|Dr-Spangle]] are currently scraping IRC quote databases (e.g. [http://www.bash.org Bash.org]).  If you can help out or suggest other quote databases to scrape, please join them in #bashup.
 +
 +
Auguste is currently writing a generic Perl script to grab any [http://www.qdbs.org QdbS]-powered QDB (e.g. Auguste's [http://www.deaddyingdamned.com/qdb/ ArchiveTeam QDB]).  As long as a website is using the default QdbS template and no major modifications, it works flawlessly.  He'll release the script publicly once he gets it working with the other common QdbS templates.
  
 
== Project Hosting ==
 
== Project Hosting ==
Line 13: Line 15:
 
* Each file is named 'n.txt', where 'n' is the quote's ID number
 
* Each file is named 'n.txt', where 'n' is the quote's ID number
 
* All quotes should be compressed into an archive
 
* All quotes should be compressed into an archive
* The archive name should identify the original location and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT')
+
* The archive name should identify the original location (URL) and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT').
 +
** If the original location (URL) has subdirectories (e.g. 'Foobar.com/baz'), replace forward slashes with hyphens: 'Foobar.com-baz'.
  
 
'''Tips'''
 
'''Tips'''
Line 89: Line 92:
 
|Darkstar
 
|Darkstar
 
|Another german quotes DB
 
|Another german quotes DB
 +
|-
 +
|JDL.Host.HK-DIY.net/Quote
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|MoarPupr.com/Quotes
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|Frostfall-Guild.com/FF/QDB
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|FreqBase.com/QDB
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|LolImBanned.com
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|BombLol.net/OrNot
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|NotSafeForSanity.com/Quotes
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|QDB.PesterChum.net
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|DeanyDerkheiser.net/QDB
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|Pilkipedia.co.uk/QDB
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|LinuxCult.org/Quotes
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|QDBS.ChanOps.org
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 +
|-
 +
|QDB.Honk-Honk.org
 +
|Yes
 +
|Auguste
 +
|Generic QdbS QDB
 
|}
 
|}
 
  
 
[[Category:Archive Team]]
 
[[Category:Archive Team]]

Revision as of 13:10, 9 April 2011

What's this, then?

Auguste, BlueMax and Dr-Spangle are currently scraping IRC quote databases (e.g. Bash.org). If you can help out or suggest other quote databases to scrape, please join them in #bashup.

Auguste is currently writing a generic Perl script to grab any QdbS-powered QDB (e.g. Auguste's ArchiveTeam QDB). As long as a website is using the default QdbS template and no major modifications, it works flawlessly. He'll release the script publicly once he gets it working with the other common QdbS templates.

Project Hosting

Auguste is currently hosting scrapes here. Everybody is encouraged to help mirror.

Helping Out

Scraping doesn't take a lot of work; the QDBs are all more or less the same. You only need to write one script, then make a few changes to adapt it to any other QDB you want to scrape. The actual scraping process should easily take under 10 minutes.

If you do want to help with the scraping, please follow the existing scrape format:

  • Each quote has its own file
  • Each file is named 'n.txt', where 'n' is the quote's ID number
  • All quotes should be compressed into an archive
  • The archive name should identify the original location (URL) and date of scraping (e.g. 'QuoteIRC.com Quote Collection 2011-04-04.7z', or 'DOMAIN.TLD Quote Collection YYYY-MM-DD.EXT').
    • If the original location (URL) has subdirectories (e.g. 'Foobar.com/baz'), replace forward slashes with hyphens: 'Foobar.com-baz'.

Tips

  • Scrape from the browse page (e.g. http://bash.org/?browse). This way you can scrape 10-50 quotes per page request, rather than cycling through thousands of individual quote pages.

Project Status

Database Has been scraped Scraper Notes
Bash.org Yes Dr-Spangle The quote database that pretty much created all others.
DeadDyingDamned.com/QDB/ No The unofficial ArchiveTeam QDB. I'll have the server automatically save these somewhere. --Auguste 13:36, 7 April 2011 (UTC)
I-Rox.com Yes Auguste
Mandaliet.com/furcqdb/ Yes Auguste The Furcadia quote database
QDB.MIT.edu Yes Auguste The MIT quote database
QDB.us Yes Auguste
QuoteIRC.com Yes Auguste
Quotes.BurntElectrons.org Yes Auguste The IRC.Mozilla.org quote database
WarpDrive.se Yes Auguste Quotes are in Swedish
WQDB.org Yes Auguste The Worms quote database
xkcdb.com Yes Auguste The xkcd quote database
german-bash.org Yes (here) Darkstar German version of bash.org
ibash.de Yes (here) Darkstar Another german quotes DB
JDL.Host.HK-DIY.net/Quote Yes Auguste Generic QdbS QDB
MoarPupr.com/Quotes Yes Auguste Generic QdbS QDB
Frostfall-Guild.com/FF/QDB Yes Auguste Generic QdbS QDB
FreqBase.com/QDB Yes Auguste Generic QdbS QDB
LolImBanned.com Yes Auguste Generic QdbS QDB
BombLol.net/OrNot Yes Auguste Generic QdbS QDB
NotSafeForSanity.com/Quotes Yes Auguste Generic QdbS QDB
QDB.PesterChum.net Yes Auguste Generic QdbS QDB
DeanyDerkheiser.net/QDB Yes Auguste Generic QdbS QDB
Pilkipedia.co.uk/QDB Yes Auguste Generic QdbS QDB
LinuxCult.org/Quotes Yes Auguste Generic QdbS QDB
QDBS.ChanOps.org Yes Auguste Generic QdbS QDB
QDB.Honk-Honk.org Yes Auguste Generic QdbS QDB