https://wiki.archiveteam.org/api.php?action=feedcontributions&user=Soult&feedformat=atomArchiveteam - User contributions [en]2024-03-29T07:55:15ZUser contributionsMediaWiki 1.37.1https://wiki.archiveteam.org/index.php?title=Early_projects&diff=17491Early projects2013-09-07T15:32:52Z<p>Soult: 4th urlteam release</p>
<hr />
<div>[[File:Archiveteamlogo.png|right|link=http://www.archive.org/details/archiveteam|Look at Archive Team Collection at Internet Archive too]]<br />
Some '''archives''' available for downloading, by [[Archive Team]] or by other volunteers or groups. Sorted by size.<br />
<br />
Look at [http://www.archive.org/details/archiveteam Archive Team Collection] at Internet Archive too.<br />
<br />
If you have archived any site, you can add a link to the table [http://archiveteam.org/index.php?title={{PAGENAMEE}}&action=edit editing this page] (or just drop a line in [http://chat.efnet.org:9090/?nick=&channels=%23archiveteam&Login=Login our IRC channel] and we will add it).<br />
__NOTOC__<br />
== Available for download ==<br />
<center><br />
{| width=1000px class="wikitable" style="text-align: center;"<br />
|-<br />
! width=300px | Title/Download link<br />
! Description<br />
! width=80px | Size<br />
|-<br />
| [http://thepiratebay.org/torrent/6353395/Geocities_-_The_PATCHED_Torrent Geocities - The PATCHED Torrent] ([http://www.archive.org/search.php?query=archive%20team%20geocities%20snapshot IA]) || The [[Geocities|popular web hosting]] service founded in 1994. It was closed by Yahoo! in 2009 || 641.4 GB<br />
|-<br />
| [http://urlte.am/releases/2013-07-20/urlteam.torrent URL Shortener Backup Torrent v4] || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2013-07-20/README.txt README]) || 75 GB<br />
|-<br />
| [http://urlte.am/releases/2013-01-02/urlteam.torrent URL Shortener Backup Torrent v3] '''outdated, use v4''' || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2013-01-02/README.txt README]) || 50 GB<br />
|-<br />
| [http://urlte.am/releases/2011-12-31/urlteam.torrent URL Shortener Backup Torrent v2] '''outdated, use v4''' || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2011-12-31/README.txt README]) || 48 GB<br />
|-<br />
| [http://urlte.am/releases/2011-05-31/urlteam.torrent URL Shortener Backup Torrent v1] '''outdated, use v4''' || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2011-05-31/README.txt README]) || 41.1 GB<br />
|-<br />
| [http://thepiratebay.org/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro Papers from Philosophical Transactions of the Royal Society] || This archive contains 18,592 scientific publications totaling 33GiB, all from Philosophical Transactions of the Royal Society and which should be available to everyone at no cost, but most have previously only been made available at high prices through paywall gatekeepers like JSTOR. || 32.48 GB<br />
|-<br />
| [http://www.archive.org/details/2011-05-calufa-twitter-sql The May 2011 Calufa Twitter Scrape] || 90+ million [[tweets]] from more than 6 million users || 14.9 GB<br />
|-<br />
| [http://torrent.ibiblio.org/doc/181 Internet Gopher Archive 2007] ([http://www.archive.org/details/2007-gopher-mirror IA]) || Archive of [[gopher]] sites || 14.8 GB<br />
|-<br />
| [http://www.archive.org/details/2010-01-encyclopedia-dramatica Encyclopedia Dramatica January 2010 Mirror] || [[lulz]] || 11.7 GB<br />
|-<br />
| [http://www.archive.org/details/textfiles-dot-com-2011 The TEXTFILES.COM Time Capsule] || This collection comprises all the major text-based sets of the [[TEXTFILES.COM]] site || 11 GB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-tabletalk-panic Salon Table Talk] || Threads of this talk site || +6.0 GB<br />
|-<br />
| [http://www.archive.org/details/utzoo-wiseman-usenet-archive Usenet Archive of UTZOO Tapes] || Collection of .TGZ files of very early USENET posted data || 2.0 GB<br />
|-<br />
| [http://torrent.ibiblio.org/doc/182 Quux.org Gopher Mirror Collection 2006] ([http://www.archive.org/details/quux-gopher-mirror IA]) || This is a collection of mirrors maintained by gopher.quux.org. These mirrors were taken offline in 2006 due to bandwidth constraints || 1.5 GB<br />
|-<br />
| [http://burnbit.com/torrent/174605/full_history_linux_git_tar full-history-linux.git.tar] || GIT repository of Linux Kernel from 1991 to 2010 ([http://lwn.net/Articles/285366/ details]) || 594 MB<br />
|-<br />
| [http://www.archive.org/details/twitter_cikm_2010 Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape] || Almost 10 million [[tweets]] || 425 MB<br />
|-<br />
| [http://www.archive.org/details/2010-reddit-research The 2010 Reddit Research Project] || Dataset on affinities of 60,000+ [[Reddit]] users, recorded in 2010 || ~360 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-starwars-yahoo Archive Team Starwars.Yahoo.Com Panic Download] || This is a panic download of the [[starwars.yahoo.com]] forums and profiles, done before the closure of same by Yahoo on December 15, 2009. This includes as many messages, profiles, and pages related to the site as could be easily brought in. || ~250 MB<br />
|-<br />
| [http://www.archive.org/details/oxford-2005-facebook-matrix Social Structure of Facebook Networks Facebook Data Scrape] || [[Facebook]] data scrape related to paper "The Social Structure of Facebook Networks", by Amanda L. Traud, Peter J. Mucha, Mason A. Porter || 197 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-etherpad-timecapsule Archive Team's Etherpad Time Capsule] || This archive contains roughly 6,400 [[Etherpad]]s, in their final state || 125 MB<br />
|-<br />
| [http://code.google.com/p/wikiteam/downloads/list WikiTeam archives] || Archives about [[wikis]]. See [[WikiTeam]] || +100 MB<br />
|-<br />
| [http://www.archive.org/details/ArchiveTeamsiteRip Archive Team] || Archive Team.org Site Rip from August 03, 2011 || 75 MB<br />
|-<br />
| [http://www.archive.org/details/boingboing-2000-2005 Boing Boing Posts Archive (2000-2011)] || Two collections of [[Boing Boing]] postings provided by the cultural website boingboing.net on its 5th and 11th anniversaries || 42 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-quotes-archive-2011-04 Archive Team Quotes Database Backup] || Amusing snatches of conversation from [[IRC]] and other online gathering places || 5 MB<br />
|- <br />
| [https://sites.google.com/site/archiveofstuff/home/localroger.com.7z Mirror of Revelation Passage Series Website] || wget of a small author's website. || ~500kb<br />
|-<br />
| [http://www.archive.org/details/archiveteam-powerblogs-2010-11-snapshot Archive Team Powerblogs Shutdown Snapshot] || This is a 108-blog snapshot of the final month of [[Powerblogs]], before their shutdown || ? <br />
|-<br />
| [http://www.archive.org/details/bbc-panic-closing-archives BBC Closing Panic Archives] || Some [[BBC]] sites || ? <br />
|-<br />
| [http://archive.org/details/stillflying.net-20120905-mirror stillflying.net] || A firefly fan fiction site that maded the rest of season 1 and season 2 pdf scripts for what would have been if firefly wasn't canceled. || 408.1mb <br />
|-<br />
| [http://archive.org/details/archiveteam_greader Google Reader] || Text for 46M feeds, per-feed statistics, Reader Directory search results || ~8800GB<br />
|-<br />
| colspan=2 | '''Total size''' || ~9492 GB<br />
|}<br />
</center><br />
<br />
== Archived but not available ==<br />
* [[Google Video]]<br />
* [[Yahoo! Videos]]<br />
<br />
== See also ==<br />
* [[Projects]]<br />
** [[:Category:Rescued Sites]]<br />
<br />
== External links ==<br />
* http://www.archive.org/details/archiveteam<br />
* http://thepiratebay.org/user/archiveteam<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]<br />
[[Category:Rescued Sites]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam/History&diff=17490URLTeam/History2013-09-07T15:29:57Z<p>Soult: 4th release</p>
<hr />
<div>This is a history of the [[URLTeam]], written using wiki history, IRC chatlogs, git commit logs and news articles.<br />
<br />
== History ==<br />
=== Beginning: 2009/01 - 2009/08 ===<br />
On January 21st, 2009 the URLTeam wiki page was created by [[User:Scumola|swebb]].<ref name="wiki-urlteam"/> He started crawling ''tinyurl.com'' and ''ff.im'' (Friendfeed) and at the end of April had already backed up a couple million URLs.<br />
<br />
=== fetcher.pl: 2009/08 - 2010/08 ===<br />
A second scraper was created by [[User:Chronomex|chronomex]] in August 2009 in Perl.<ref name="git-urlteam"/> The output fomat pioneered by his script is still used by the urlteam for releases. He used the scraper to save various smaller shorteners such as ''4url.cc'' and ''surl.ws'', but also for bigger shorteners like ''is.gd''. Back then ''is.gd'' was still using sequential shortcodes and had no rate limiting.<br />
<br />
At the same time, on August 12th, 2009 the domain ''urlte.am'' was registered by [[User:Jscott|SketchCow]].<ref name="whois-urlte.am"/> The website only displayed the logo and the tag line "url shortening was a fucking awful idea".<ref name="archive.org-urlte.am"/><br />
<br />
=== tinyback: 2010/08 - 2011/01 ===<br />
Almost exactly a year after chronomex created a Perl-based shortener, [[User:Soult|soultcer]] decided to write a shortener in Ruby.<ref name="git-tinyback"/> Developement was slow and sporadical due to his day job, with the first runable but buggy version ready on October 26th, 2010. Tinyback originally used MessagePack as output format and only supported ''tinyurl.com''. Support for ''is.gd'', ''bit.ly'' and ''tr.im'' was added in the following months along with many other improvements and the change to the same output format as chronomex' fetcher.pl.<br />
<br />
One goal of tinyback was to handle special cases and bugs in many URL shorteners, like ''tinyurl.com''<nowiki>'</nowiki>s [http://tinyurl.com/2zln TinyURL redirects to a TinyURL] error page or ''bit.ly''<nowiki>'</nowiki>s [https://bit.ly/YNUkZB STOP] page. On one hand this led to more complete backups of an URL shortener, but on the other hand it made adding support for new shorteners more difficult.<br />
<br />
Also on October 26th, 2010, when tinyback developement was only getting started, soultcer asked SketchCow about putting up the scraped data for download on the urlte.am website. SketchCow approved, but it would still be a long time until the first release.<br />
<br />
File handling and shortcode assignment was done manually back then, with the resuling output files being copied around using scp before being sorted and merged with some rather slow C tools. Following the creation of tinyback, both chronomex and soultcer did lots of scraping using their respective tools. To get around IP bans, cheap low end VPS were used for scraping.<br />
<br />
=== First release: 2011/01 - 2011/06 ===<br />
==== is.gd changes ====<br />
On January 12th, 2011 ''is.gd'' migrated to a new architecture<ref name="isgd-news"/>. Shortcodes changed from sequential to random and a very strict 60 requests per minute limit was added. This made scraping more difficult, but luckily chronomex had fetched almost all the sequential shortcodes before the switch was made.<br />
<br />
==== Release preparation ====<br />
In January 2011 soultcer started an effort to combine the scraped data from chronomex and himself, which he finished in February. On February 9th, 2011 soultcer made the first commit to the urlteam-stuff repository, which holds the urlte.am website.<br />
<br />
Also in 2011 [[User:Jeroenz0r|Jeroenz0r]] and [[User:Underscor|underscor]] joined the urlteam and helped with scraping and various other stuff. In March 2011 swebb also discovered the IRC channel and uploaded his scraped data for soultcer to merge. His scrape of tr.im was especially useful, but more on that later.<br />
<br />
==== 301works cooperation ====<br />
In March 2011 Sketchcow arranged for Jeff Kaplan from [http://301works.org 301Works.org] to give soultcer an upload slot to the [http://archive.org/details/301works 301works collection on archive.org]. From March 15th to July 6th soultcer uploaded all data he had scraped from ''bit.ly'' so far in a csv-based format to the [http://archive.org/details/301utm 301utm collection]. The list of files and the codes they contain is also [https://github.com/ArchiveTeam/urlteam-stuff/blob/master/301works stored] in the urlteam-stuff git repository. Unfortunately no further updates to the data have been made after that point.<br />
<br />
==== Release ====<br />
In April 2011 soultcer stated on IRC that he wanted to create the release torrent, but he had trouble merging swebb's data from ''tr.im'' with his own scrapes. As it turned out, ''tr.im'' was very broken and returned some bad data, which is why the URLteam settled on only putting parts of the ''tr.im'' scrape in the torrent. After the contents were finalized in May 2011, underscor provided a server to upload the 40 GB of compressed data files that had been collected. The upload finished on May 31st, and underscor created a torrent from it, marking our very first release. It took another couple of days to update the homepage, but after over 2 years of scraping, the first results were finally available for download.<br />
<br />
=== Second release: 2011/06 - 2012/01 ===<br />
After the first release, most people were rather busy with other stuff, so when the self-imposed deadline of December 2011 approached, the only new files were a couple gigabytes of scraped data from ''tinyurl.com'' and the merged data from ''tr.im'' (see below). The release was created on the last day of 2011.<br />
<br />
==== tr.im (a short digression) ====<br />
In August 2009, when popular shortener ''bit.ly'' became the default shortener for Twitter, Eric Woodward, owner of the not quite so popular shortener ''tr.im'' was rather butthurt, and decided to shut down ''tr.im'' in spite.<ref name="trim-shutdown"/> This caused a massive uproar, because it made people realize that once ''tr.im'' shut down, millions of URLs would just stop working, or even worse, redirect to some spam site. This not only affected ''tr.im'', but undermined the claim for legetimazy for every other URL shortener as well. ''bit.ly'' offered to continue hosting ''tr.im'', but Eric Woodward was having none if it.<ref name="trim-offer"/><ref name="trim-refused-offer"/> In the end he reopened ''tr.im'' and a few days later announced it would live on as an open source project.<ref name="trim-reopen"/><ref name="trim-opensource"/> While he did release the source, nothing ever came of his "community-owned" URL shortener idea.<ref name="trim-source"/> Shortening of new URLs was disabled and redirecting barely worked, breaking when too many requests (like more than 5 per minute) were made. In March 2011 soultcer removed the Trim class from tinyback, effectively ceasing any further backup efforts.<ref name="git-tinyback"/><br />
<br />
Since the first release included not the full scraped that from tr.im, soultcer decided to do it right for the second release. In May 2011 he tried to merge the scrapes swebb and he had done, which turned out to be a complicated process. Using the source code released by Woodward, he was able to understand some of the weird quirks that tr.im had: Shorturl codes could either be autogenerated or custom codes. Autogenerated ones were case-sensitive, custom one were not. If a new autogenerated code was the same as a custom code, it might overwrite the custom code. Also, URLs would be randomly truncated for no understandable reason.<ref name="trim-source"/> With that (half-)knowledge, he pierced together a final backup of the ''tr.im'' shortener, which was included in the second release.<br />
<br />
=== A new approach: 2012/01 - 2013/01 ===<br />
After the second release, work on the URLTeam slowed down once more. Soultcer became unhappy with his ruby-based scraper and rewrote it in Python 3, which was not very widely used at the time and made character encoding handling more difficult. In August 2012 he rewrote tinyback again, based on his Python 3 version, but this time for Python 2. To distinguish the new version from the old Ruby tinyback, it was called tinyback v2.<br />
<br />
==== Tinyarchive database and tracker ====<br />
The process of creating the release was rather cumbersome and error-prone: URL shorteners and ranges were assigned manually to scraping hosts, using a text file for coordination. The results were tracked with git-annex, and then merged using unintuitive command-line tools. With the release of tinyback v2, soultcer also created a small tracker. It was written in September 2012, also using Python 2, and sqlite3 as database backend. The tracker was responsible for handing out tasks to tinyback instances using a simple HTTP API, making sure that only one task for each URL shortener was handed out per IP address, to avoid IP blocks. The results were then uploaded back to the tracker.<br />
<br />
Previously all data was stored in sorted and unsorted text files, often compressed to save space. Using the name tinyarchive, soultcer created some tools to manage all scraped URLs in a database instead, using Python 2 and BerkeleyDB. The resuling database was from then used as the canonical method for storing URL shortener backups, with releases being generated directly from the database.<br />
<br />
==== Third release ====<br />
The planned release cycle of 6 months put the next release in June 2012, but work on the new tinyback and tinyarchive only started in August 2012. Almost no scraping was done beforehand, so we needed some time to scrape new data for the third release. The new tools sped up scraping, especially since the hurdle-of-entry was lowered by the automatic task assignment. The tinyback project was also added to the [[ArchiveTeam Warrior]].<br />
<br />
Preparation for the next release started mid-December and the release was created on January 1st, 2013.<br />
<br />
=== Fourth release: 2013/01 - now ===<br />
The new tinyback code made scraping easier, and as a result more people started doing so, even outside the Warrior. The code also received patches by [[User:Alard|alard]], [[User:Chfoo|chfoo]], [[User:Ersi|ersi]] and [[User:Joepie91|jopie91]] that fixed bugs, improved log output, and added support for new URL shorteners.<br />
<br />
==== tr.im relaunches ====<br />
On December 2012 soultcer discovered a posting on a job board looking for a programmer to work on a relaunch for ''tr.im''. The domain name had been brought by a domain name "investment company" (= domain squatter), but it was unclear if they had also acquired the database. On January 30th, 2013, ''tr.im'' was back online, and it turns out that the database had not been part of the deal. Shortcode generation was sequential, but with weird gaps in between, probably in an effort to leave many codes unused. Unused codes would automatically redirect to advertisement after a couple of seconds. To keep at least some of the original ''tr.im'' links alive, soultcer started submitting the data he had from the old ''tr.im'' into the new ''tr.im''. It was not a perfect solution, but at least a part of the links were preserved that way.<br />
<br />
==== Fourth Release ====<br />
On May 16th, 2013 soultcer announced that he would be stepping down from his role in the URLTeam after the following release. Chronomex and ersi volunteered to take over the duty of running the tracker, the tinyarchive database and creating the releases. The fourth release was released on July 20th, 2013.<br />
<br />
== Sources ==<br />
<references><br />
<ref name="trim-shutdown">http://mashable.com/2009/08/09/trim-shuts-down/</ref><br />
<ref name="trim-offer">http://mashable.com/2009/08/09/shorturl-savior/</ref><br />
<ref name="trim-refused-offer">http://techzinglive.com/page/101/techzing-13-tr-im-bit-ly-and-twitter-the-real-story</ref><br />
<ref name="trim-reopen">http://mashable.com/2009/08/11/trim-reopened/</ref><br />
<ref name="trim-opensource">http://mashable.com/2009/08/17/tr-im-community-owned/</ref><br />
<ref name="trim-source">While not available on Github anymore, soultcer still has a copy of the open sourced tr.im code</ref><br />
<ref name="isgd-news">http://is.gd/news.php</ref><br />
<ref name="archive.org-urlte.am">http://web.archive.org/web/20091227074236/http://urlte.am/</ref><br />
<ref name="whois-urlte.am">http://whois.domaintools.com/urlte.am</ref><br />
<ref name="wiki-urlteam">http://www.archiveteam.org/index.php?title=URLTeam&limit=500&action=history</ref><br />
<ref name="git-urlteam">https://github.com/chronomex/urlteam/commits/master</ref><br />
<ref name="git-tinyback">https://github.com/ArchiveTeam/tinyback/commits/master</ref><br />
</references></div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=17217URLTeam2013-07-06T21:46:05Z<p>Soult: ow.ly: Onle done to lyZZZ, not lzZZZ</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 10,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, done up to azzzzz<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 50,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 934,134,706 (2013-05-20)<br />
| [[Warrior]]<br />
| scraping: sequential, done up to kZZZZ<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| Old tr.im<br />
| 1990425<br />
| -<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| [http://tr.im/ New tr.im]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 42pzz<br />
| new shorturls: sequential<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| Salami at 55%. [https://dl.dropboxusercontent.com/u/672132/urlteam/visiblihex_incomplete_20130705.xz Incomplete ~8.3mil 176MB ]<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|-<br />
| [http://ur1.ca ur1.ca]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to dzzzz<br />
| new shorturls: sequential<br />
|-<br />
| [http://ow.ly ow.ly]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to lyZZZ<br />
| new shorturls: sequential<br />
|-<br />
| [http://snipurl.com/ snipurl.com]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to <nowiki>271~~~~</nowiki><br />
| new shorturls: sequential, starting from 20wa5rt<br />
|-<br />
| [http://post.ly post.ly] (Posterous)<br />
| ?<br />
| [[Warrior]]/EC2<br />
| done<br />
| dead<br />
|-<br />
| [http://vbly.us vbly.us] (formerly vb.ly)<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 2hba<br />
| new shorturls: sequential<br />
|-<br />
| [http://arseh.at arseh.at]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 4fv3<br />
| new shorturls: sequential<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-04-17. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* adjix.com<br />
* ask.fm - ask.fm/a/40k05kgp<br />
* awe.sm<br />
* biglnk.com<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* burl.se<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dld.bz - "private URL shortening service"<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* flip.it - Flipboard<br />
* fnd.us (See offical shorteners)<br />
* fwdurl.net<br />
* go.to<br />
* go2.me - Appears incremental: http://u.go2.me/6YK http://u.go2.me/6YL<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* korta.nu<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ph.ly Related to the pond called Philadelphia, where links are born and raised<br />
* po.st<br />
* prsm.tc - getprismatic.com<br />
* r.ebay.com<br />
* rod.gs<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shar.es (See offical shorteners)<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrd.by - see sharedby.co<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* soa.li - Gigya inc.<br />
* spnsr.tw - sponsoredtweets.com<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tinyarrows.com / ta.gd / ri.ms / ➡.ws / ➨.ws / ➯.ws / ➔.ws / ➞.ws / ➽.ws / ➹.ws / ✩.ws / ✿.ws / ❥.ws / ›.ws / ⌘.ws / ‽.ws / ☁.ws - Appears non-incremental: uses user-defined words for URLs (e.g. http://➡.ws/URLTEAM)<br />
* trib.al<br />
* tr.im - Appears incremental: http://tr.im/44tn2 http://tr.im/44tn4<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
<br />
==== "Official" shorteners ====<br />
<br />
* upl.nu - Ung Pirat (Youth Pirate Party, Sweden)<br />
* bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)<br />
* CokeURL.com - Coca-Cola<br />
* db.tt - DropBox<br />
* di.sn - Disney<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )<br />
* hub.me - HubPages<br />
* igg.me - Indiegogo<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis] - 404 on homepage, otherwise ok<br />
* skfb.ly - Sketchfab<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* tmblr.co - Tumblr<br />
* wapo.st - Washington Post<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* 4sq.com - Foursquare<br />
* aje.me - Aljazeera<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* bzfd.it - Buzzfeed<br />
* chzb.gr - Cheezeburger<br />
* cnet.co - CNET<br />
* cnnmon.ie - CNN Money<br />
* conta.cc - Constant Contact Inc.<br />
* dennysd.in - Denny's Restaurants<br />
* dtoid.it - Destructoid<br />
* econ.st - The Economist<br />
* engri.sh - Engrish.com<br />
* es.pn - ESPN<br />
* gaw.kr - Gawker<br />
* grd.to - The Grid TO<br />
* huff.to - Huffington Post<br />
* j.mp - bit.ly<ref>http://blog.bitly.com/post/179664996/go-ahead-and-j-mp</ref><br />
* jrnl.to - thejournal.ie<br />
* kck.st - Kickstarter<br />
* marsdd.it - MaRS Discovery District<br />
* nyti.ms - New York Times<br />
* onforb.es - Forbes<br />
* onion.com - The Onion<br />
* read.bi - Business Insider<br />
* rseo.co - realseo<br />
* sbn.to - sbnation<br />
* slackers.co - slackers.com<br />
* s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly<br />
* stjo.es - St. Joseph Media<br />
* squid.us - Laughing Squid<br />
* tcrn.ch - Techcrunch<br />
* theatln.tc - The Atlantic<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
* s831.us - whatever that is<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
* [http://rield.com/faq/why-url-shorteners-are-bad Why URL shortening services and shortURLs are bad]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam/History&diff=17216URLTeam/History2013-07-06T18:24:48Z<p>Soult: fix some typos</p>
<hr />
<div>This is a history of the [[URLTeam]], written using wiki history, IRC chatlogs, git commit logs and news articles.<br />
<br />
== History ==<br />
=== Beginning: 2009/01 - 2009/08 ===<br />
On January 21st, 2009 the URLTeam wiki page was created by [[User:Scumola|swebb]].<ref name="wiki-urlteam"/> He started crawling ''tinyurl.com'' and ''ff.im'' (Friendfeed) and at the end of April had already backed up a couple million URLs.<br />
<br />
=== fetcher.pl: 2009/08 - 2010/08 ===<br />
A second scraper was created by [[User:Chronomex|chronomex]] in August 2009 in Perl.<ref name="git-urlteam"/> The output fomat pioneered by his script is still used by the urlteam for releases. He used the scraper to save various smaller shorteners such as ''4url.cc'' and ''surl.ws'', but also for bigger shorteners like ''is.gd''. Back then ''is.gd'' was still using sequential shortcodes and had no rate limiting.<br />
<br />
At the same time, on August 12th, 2009 the domain ''urlte.am'' was registered by [[User:Jscott|SketchCow]].<ref name="whois-urlte.am"/> The website only displayed the logo and the tag line "url shortening was a fucking awful idea".<ref name="archive.org-urlte.am"/><br />
<br />
=== tinyback: 2010/08 - 2011/01 ===<br />
Almost exactly a year after chronomex created a Perl-based shortener, [[User:Soult|soultcer]] decided to write a shortener in Ruby.<ref name="git-tinyback"/> Developement was slow and sporadical due to his day job, with the first runable but buggy version ready on October 26th, 2010. Tinyback originally used MessagePack as output format and only supported ''tinyurl.com''. Support for ''is.gd'', ''bit.ly'' and ''tr.im'' was added in the following months along with many other improvements and the change to the same output format as chronomex' fetcher.pl.<br />
<br />
One goal of tinyback was to handle special cases and bugs in many URL shorteners, like ''tinyurl.com''<nowiki>'</nowiki>s [http://tinyurl.com/2zln TinyURL redirects to a TinyURL] error page or ''bit.ly''<nowiki>'</nowiki>s [https://bit.ly/YNUkZB STOP] page. On one hand this led to more complete backups of an URL shortener, but on the other hand it made adding support for new shorteners more difficult.<br />
<br />
Also on October 26th, 2010, when tinyback developement was only getting started, soultcer asked SketchCow about putting up the scraped data for download on the urlte.am website. SketchCow approved, but it would still be a long time until the first release.<br />
<br />
File handling and shortcode assignment was done manually back then, with the resuling output files being copied around using scp before being sorted and merged with some rather slow C tools. Following the creation of tinyback, both chronomex and soultcer did lots of scraping using their respective tools. To get around IP bans, cheap low end VPS were used for scraping.<br />
<br />
=== First release: 2011/01 - 2011/06 ===<br />
==== is.gd changes ====<br />
On January 12th, 2011 ''is.gd'' migrated to a new architecture<ref name="isgd-news"/>. Shortcodes changed from sequential to random and a very strict 60 requests per minute limit was added. This made scraping more difficult, but luckily chronomex had fetched almost all the sequential shortcodes before the switch was made.<br />
<br />
==== Release preparation ====<br />
In January 2011 soultcer started an effort to combine the scraped data from chronomex and himself, which he finished in February. On February 9th, 2011 soultcer made the first commit to the urlteam-stuff repository, which holds the urlte.am website.<br />
<br />
Also in 2011 [[User:Jeroenz0r|Jeroenz0r]] and [[User:Underscor|underscor]] joined the urlteam and helped with scraping and various other stuff. In March 2011 swebb also discovered the IRC channel and uploaded his scraped data for soultcer to merge. His scrape of tr.im was especially useful, but more on that later.<br />
<br />
==== 301works cooperation ====<br />
In March 2011 Sketchcow arranged for Jeff Kaplan from [http://301works.org 301Works.org] to give soultcer an upload slot to the [http://archive.org/details/301works 301works collection on archive.org]. From March 15th to July 6th soultcer uploaded all data he had scraped from ''bit.ly'' so far in a csv-based format to the [http://archive.org/details/301utm 301utm collection]. The list of files and the codes they contain is also [https://github.com/ArchiveTeam/urlteam-stuff/blob/master/301works stored] in the urlteam-stuff git repository. Unfortunately no further updates to the data have been made after that point.<br />
<br />
==== Release ====<br />
In April 2011 soultcer stated on IRC that he wanted to create the release torrent, but he had trouble merging swebb's data from ''tr.im'' with his own scrapes. As it turned out, ''tr.im'' was very broken and returned some bad data, which is why the URLteam settled on only putting parts of the ''tr.im'' scrape in the torrent. After the contents were finalized in May 2011, underscor provided a server to upload the 40 GB of compressed data files that had been collected. The upload finished on May 31st, and underscor created a torrent from it, marking our very first release. It took another couple of days to update the homepage, but after over 2 years of scraping, the first results were finally available for download.<br />
<br />
=== Second release: 2011/06 - 2012/01 ===<br />
After the first release, most people were rather busy with other stuff, so when the self-imposed deadline of December 2011 approached, the only new files were a couple gigabytes of scraped data from ''tinyurl.com'' and the merged data from ''tr.im'' (see below). The release was created on the last day of 2011.<br />
<br />
==== tr.im (a short digression) ====<br />
In August 2009, when popular shortener ''bit.ly'' became the default shortener for Twitter, Eric Woodward, owner of the not quite so popular shortener ''tr.im'' was rather butthurt, and decided to shut down ''tr.im'' in spite.<ref name="trim-shutdown"/> This caused a massive uproar, because it made people realize that once ''tr.im'' shut down, millions of URLs would just stop working, or even worse, redirect to some spam site. This not only affected ''tr.im'', but undermined the claim for legetimazy for every other URL shortener as well. ''bit.ly'' offered to continue hosting ''tr.im'', but Eric Woodward was having none if it.<ref name="trim-offer"/><ref name="trim-refused-offer"/> In the end he reopened ''tr.im'' and a few days later announced it would live on as an open source project.<ref name="trim-reopen"/><ref name="trim-opensource"/> While he did release the source, nothing ever came of his "community-owned" URL shortener idea.<ref name="trim-source"/> Shortening of new URLs was disabled and redirecting barely worked, breaking when too many requests (like more than 5 per minute) were made. In March 2011 soultcer removed the Trim class from tinyback, effectively ceasing any further backup efforts.<ref name="git-tinyback"/><br />
<br />
Since the first release included not the full scraped that from tr.im, soultcer decided to do it right for the second release. In May 2011 he tried to merge the scrapes swebb and he had done, which turned out to be a complicated process. Using the source code released by Woodward, he was able to understand some of the weird quirks that tr.im had: Shorturl codes could either be autogenerated or custom codes. Autogenerated ones were case-sensitive, custom one were not. If a new autogenerated code was the same as a custom code, it might overwrite the custom code. Also, URLs would be randomly truncated for no understandable reason.<ref name="trim-source"/> With that (half-)knowledge, he pierced together a final backup of the ''tr.im'' shortener, which was included in the second release.<br />
<br />
=== A new approach: 2012/01 - 2013/01 ===<br />
After the second release, work on the URLTeam slowed down once more. Soultcer became unhappy with his ruby-based scraper and rewrote it in Python 3, which was not very widely used at the time and made character encoding handling more difficult. In August 2012 he rewrote tinyback again, based on his Python 3 version, but this time for Python 2. To distinguish the new version from the old Ruby tinyback, it was called tinyback v2.<br />
<br />
==== Tinyarchive database and tracker ====<br />
The process of creating the release was rather cumbersome and error-prone: URL shorteners and ranges were assigned manually to scraping hosts, using a text file for coordination. The results were tracked with git-annex, and then merged using unintuitive command-line tools. With the release of tinyback v2, soultcer also created a small tracker. It was written in September 2012, also using Python 2, and sqlite3 as database backend. The tracker was responsible for handing out tasks to tinyback instances using a simple HTTP API, making sure that only one task for each URL shortener was handed out per IP address, to avoid IP blocks. The results were then uploaded back to the tracker.<br />
<br />
Previously all data was stored in sorted and unsorted text files, often compressed to save space. Using the name tinyarchive, soultcer created some tools to manage all scraped URLs in a database instead, using Python 2 and BerkeleyDB. The resuling database was from then used as the canonical method for storing URL shortener backups, with releases being generated directly from the database.<br />
<br />
==== Third release ====<br />
The planned release cycle of 6 months put the next release in June 2012, but work on the new tinyback and tinyarchive only started in August 2012. Almost no scraping was done beforehand, so we needed some time to scrape new data for the third release. The new tools sped up scraping, especially since the hurdle-of-entry was lowered by the automatic task assignment. The tinyback project was also added to the [[ArchiveTeam Warrior]].<br />
<br />
Preparation for the next release started mid-December and the release was created on January 1st, 2013.<br />
<br />
=== Fourth release: 2013/01 - now ===<br />
The new tinyback code made scraping easier, and as a result more people started doing so, even outside the Warrior. The code also received patches by [[User:Alard|alard]], [[User:Chfoo|chfoo]], [[User:Ersi|ersi]] and [[User:Joepie91|jopie91]] that fixed bugs, improved log output, and added support for new URL shorteners.<br />
<br />
==== tr.im relaunches ====<br />
On December 2012 soultcer discovered a posting on a job board looking for a programmer to work on a relaunch for ''tr.im''. The domain name had been brought by a domain name "investment company" (= domain squatter), but it was unclear if they had also acquired the database. On January 30th, 2013, ''tr.im'' was back online, and it turns out that the database had not been part of the deal. Shortcode generation was sequential, but with weird gaps in between, probably in an effort to leave many codes unused. Unused codes would automatically redirect to advertisement after a couple of seconds. To keep at least some of the original ''tr.im'' links alive, soultcer started submitting the data he had from the old ''tr.im'' into the new ''tr.im''. It was not a perfect solution, but at least a part of the links were preserved that way.<br />
<br />
==== Fourth Release ====<br />
On May 16th, 2013 soultcer announced that he would be stepping down from his role in the URLTeam after the following release. Chronomex and ersi volunteered to take over the duty of running the tracker, the tinyarchive database and creating the releases. The fourth release is planned for the beginning of July 2013.<br />
<br />
== Sources ==<br />
<references><br />
<ref name="trim-shutdown">http://mashable.com/2009/08/09/trim-shuts-down/</ref><br />
<ref name="trim-offer">http://mashable.com/2009/08/09/shorturl-savior/</ref><br />
<ref name="trim-refused-offer">http://techzinglive.com/page/101/techzing-13-tr-im-bit-ly-and-twitter-the-real-story</ref><br />
<ref name="trim-reopen">http://mashable.com/2009/08/11/trim-reopened/</ref><br />
<ref name="trim-opensource">http://mashable.com/2009/08/17/tr-im-community-owned/</ref><br />
<ref name="trim-source">While not available on Github anymore, soultcer still has a copy of the open sourced tr.im code</ref><br />
<ref name="isgd-news">http://is.gd/news.php</ref><br />
<ref name="archive.org-urlte.am">http://web.archive.org/web/20091227074236/http://urlte.am/</ref><br />
<ref name="whois-urlte.am">http://whois.domaintools.com/urlte.am</ref><br />
<ref name="wiki-urlteam">http://www.archiveteam.org/index.php?title=URLTeam&limit=500&action=history</ref><br />
<ref name="git-urlteam">https://github.com/chronomex/urlteam/commits/master</ref><br />
<ref name="git-tinyback">https://github.com/ArchiveTeam/tinyback/commits/master</ref><br />
</references></div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=16869URLTeam2013-06-01T09:42:24Z<p>Soult: owly</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 10,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, done up to azzzzz<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 50,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 934,134,706 (2013-05-20)<br />
| [[Warrior]]<br />
| scraping: sequential, done up to kZZZZ<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| Old tr.im<br />
| 1990425<br />
| -<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| [http://tr.im/ New tr.im]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 42pzz<br />
| new shorturls: sequential<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| Cake at 28%. [https://dl.dropboxusercontent.com/u/672132/urlteam/visiblihex_incomplete_20130530.xz Incomplete ~4.3mil 91MB ]<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|-<br />
| [http://ur1.ca ur1.ca]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to dzzzz<br />
| new shorturls: sequential<br />
|-<br />
| [http://ow.ly ow.ly]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to lzZZZ<br />
| new shorturls: sequential<br />
|-<br />
| [http://snipurl.com/ snipurl.com]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to <nowiki>271~~~~</nowiki><br />
| new shorturls: sequential, starting from 20wa5rt<br />
|-<br />
| [http://post.ly post.ly] (Posterous)<br />
| ?<br />
| [[Warrior]]/EC2<br />
| done<br />
| dead<br />
|-<br />
| [http://vbly.us vbly.us] (formerly vb.ly)<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 2hba<br />
| new shorturls: sequential<br />
|-<br />
| [http://arseh.at arseh.at]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 4fv3<br />
| new shorturls: sequential<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-04-17. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* adjix.com<br />
* ask.fm - ask.fm/a/40k05kgp<br />
* awe.sm<br />
* biglnk.com<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* burl.se<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dld.bz - "private URL shortening service"<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* flip.it - Flipboard<br />
* fnd.us (See offical shorteners)<br />
* fwdurl.net<br />
* go.to<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* korta.nu<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ph.ly Related to the pond called Philadelphia, where links are born and raised<br />
* po.st<br />
* r.ebay.com<br />
* ri.ms<br />
* rod.gs<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shar.es (See offical shorteners)<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrd.by - see sharedby.co<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* soa.li - Gigya inc.<br />
* surl.co.uk - Many shortening options.<br />
* ta.gd<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tinyarrows.com<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
* ➡.ws<br />
* ➨.ws<br />
* ➯.ws<br />
* ➔.ws<br />
* ➞.ws<br />
* ➽.ws<br />
* ➹.ws<br />
* ✩.ws<br />
* ✿.ws<br />
* ❥.ws<br />
* ›.ws<br />
* ⌘.ws<br />
* ‽.ws<br />
* ☁.ws<br />
<br />
<br />
==== "Official" shorteners ====<br />
<br />
* bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)<br />
* CokeURL.com - Coca-Cola<br />
* db.tt - DropBox<br />
* di.sn - Disney<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )<br />
* hub.me - HubPages<br />
* igg.me - Indiegogo<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis] - 404 on homepage, otherwise ok<br />
* skfb.ly - Sketchfab<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* tmblr.co - Tumblr<br />
* wapo.st - Washington Post<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* 4sq.com - Foursquare<br />
* aje.me - Aljazeera<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* chzb.gr - Cheezeburger<br />
* cnet.co - CNET<br />
* cnnmon.ie - CNN Money<br />
* conta.cc - Constant Contact Inc.<br />
* dennysd.in - Denny's Restaurants<br />
* dtoid.it - Destructoid<br />
* econ.st - The Economist<br />
* es.pn - ESPN<br />
* gaw.kr - Gawker<br />
* grd.to - The Grid TO<br />
* huff.to - Huffington Post<br />
* j.mp - bit.ly<ref>http://blog.bitly.com/post/179664996/go-ahead-and-j-mp</ref><br />
* jrnl.to - thejournal.ie<br />
* kck.st - Kickstarter<br />
* marsdd.it - MaRS Discovery District<br />
* nyti.ms - New York Times<br />
* onforb.es - Forbes<br />
* onion.com - The Onion<br />
* read.bi - Business Insider<br />
* rseo.co - realseo<br />
* sbn.to - sbnation<br />
* slackers.co - slackers.com<br />
* s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly<br />
* stjo.es - St. Joseph Media<br />
* squid.us - Laughing Squid<br />
* tcrn.ch - Techcrunch<br />
* theatln.tc - The Atlantic<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
* s831.us - whatever that is<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=16856URLTeam2013-05-31T08:28:56Z<p>Soult: /* New table */ Update scraping status</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 10,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, done up to azzzzz<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 50,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 934,134,706 (2013-05-20)<br />
| [[Warrior]]<br />
| scraping: sequential, done up to kZZZZ<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| Old tr.im<br />
| 1990425<br />
| -<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| [http://tr.im/ New tr.im]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 42pzz<br />
| new shorturls: sequential<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| Cake at 28%. [https://dl.dropboxusercontent.com/u/672132/urlteam/visiblihex_incomplete_20130530.xz Incomplete ~4.3mil 91MB ]<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|-<br />
| [http://ur1.ca ur1.ca]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to dzzzz<br />
| new shorturls: sequential<br />
|-<br />
| [http://ow.ly ow.ly]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to lyZZZ<br />
| new shorturls: sequential<br />
|-<br />
| [http://snipurl.com/ snipurl.com]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to <nowiki>271~~~~</nowiki><br />
| new shorturls: sequential, starting from 20wa5rt<br />
|-<br />
| [http://post.ly post.ly] (Posterous)<br />
| ?<br />
| [[Warrior]]/EC2<br />
| done<br />
| dead<br />
|-<br />
| [http://vbly.us vbly.us] (formerly vb.ly)<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 2hba<br />
| new shorturls: sequential<br />
|-<br />
| [http://arseh.at arseh.at]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 4fv3<br />
| new shorturls: sequential<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-04-17. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* adjix.com<br />
* ask.fm - ask.fm/a/40k05kgp<br />
* awe.sm<br />
* biglnk.com<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* burl.se<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dld.bz - "private URL shortening service"<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* flip.it - Flipboard<br />
* fnd.us (See offical shorteners)<br />
* fwdurl.net<br />
* go.to<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* korta.nu<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ph.ly Related to the pond called Philadelphia, where links are born and raised<br />
* po.st<br />
* r.ebay.com<br />
* ri.ms<br />
* rod.gs<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shar.es (See offical shorteners)<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrd.by - see sharedby.co<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* soa.li - Gigya inc.<br />
* surl.co.uk - Many shortening options.<br />
* ta.gd<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tinyarrows.com<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
* ➡.ws<br />
* ➨.ws<br />
* ➯.ws<br />
* ➔.ws<br />
* ➞.ws<br />
* ➽.ws<br />
* ➹.ws<br />
* ✩.ws<br />
* ✿.ws<br />
* ❥.ws<br />
* ›.ws<br />
* ⌘.ws<br />
* ‽.ws<br />
* ☁.ws<br />
<br />
<br />
==== "Official" shorteners ====<br />
<br />
* bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)<br />
* CokeURL.com - Coca-Cola<br />
* db.tt - DropBox<br />
* di.sn - Disney<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )<br />
* hub.me - HubPages<br />
* igg.me - Indiegogo<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis] - 404 on homepage, otherwise ok<br />
* skfb.ly - Sketchfab<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* tmblr.co - Tumblr<br />
* wapo.st - Washington Post<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* 4sq.com - Foursquare<br />
* aje.me - Aljazeera<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* chzb.gr - Cheezeburger<br />
* cnet.co - CNET<br />
* cnnmon.ie - CNN Money<br />
* conta.cc - Constant Contact Inc.<br />
* dennysd.in - Denny's Restaurants<br />
* dtoid.it - Destructoid<br />
* econ.st - The Economist<br />
* es.pn - ESPN<br />
* gaw.kr - Gawker<br />
* grd.to - The Grid TO<br />
* huff.to - Huffington Post<br />
* j.mp - bit.ly<ref>http://blog.bitly.com/post/179664996/go-ahead-and-j-mp</ref><br />
* jrnl.to - thejournal.ie<br />
* kck.st - Kickstarter<br />
* marsdd.it - MaRS Discovery District<br />
* nyti.ms - New York Times<br />
* onforb.es - Forbes<br />
* onion.com - The Onion<br />
* read.bi - Business Insider<br />
* rseo.co - realseo<br />
* slackers.co - slackers.com<br />
* s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly<br />
* stjo.es - St. Joseph Media<br />
* squid.us - Laughing Squid<br />
* tcrn.ch - Techcrunch<br />
* theatln.tc - The Atlantic<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
* s831.us - whatever that is<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam/History&diff=16836URLTeam/History2013-05-27T20:28:26Z<p>Soult: Created page with "This is a history of the URLTeam, written using wiki history, IRC chatlogs, git commit logs and news articles. == History == === Beginning: 2009/01 - 2009/08 === On Janua..."</p>
<hr />
<div>This is a history of the [[URLTeam]], written using wiki history, IRC chatlogs, git commit logs and news articles.<br />
<br />
== History ==<br />
=== Beginning: 2009/01 - 2009/08 ===<br />
On January 21st, 2009 the URLTeam wiki page was created by [[User:Scumola|swebb]].<ref name="wiki-urlteam"/> He started crawling ''tinyurl.com'' and ''ff.im'' (Friendfeed) and at the end of April had already backed up a couple million URLs.<br />
<br />
=== fetcher.pl: 2009/08 - 2010/08 ===<br />
A second scraper was created by [[User:Chronomex|chronomex]] in August 2009 in Perl.<ref name="git-urlteam"/> The output fomat pioneered by his script is still used by the urlteam for releases. He used the scraper to save various smaller shorteners such as ''4url.cc'' and ''surl.ws'', but also for bigger shorteners like ''is.gd''. Back then ''is.gd'' was still using sequential shortcodes and had no rate limiting.<br />
<br />
At the same time, on August 12th, 2009 the domain ''urlte.am'' was registered by [[User:Jscott|SketchCow]].<ref name="whois-urlte.am"/> The website only displayed the logo and the tag line "url shortening was a fucking awful idea".<ref name="archive.org-urlte.am"/><br />
<br />
=== tinyback: 2010/08 - 2011/01 ===<br />
Almost exactly a year after chronomex created a Perl-based shortener, [[User:Soult|soultcer]] decided to write a shortener in Ruby.<ref name="git-tinyback"/> Developement was slow and sporadical due to his day job, with the first runable but buggy version ready on October 26th, 2010. Tinyback originally used MessagePack as output format and only supported ''tinyurl.com''. Support for ''is.gd'', ''bit.ly'' and ''tr.im'' was added in the following months along with many other improvements and the change to the same output format as chronomex' fetcher.pl.<br />
<br />
One goal of tinyback was to handle special cases and bugs in many URL shorteners, like ''tinyurl.com''<nowiki>'</nowiki>s [http://tinyurl.com/2zln TinyURL redirects to a TinyURL] error page or ''bit.ly''<nowiki>'</nowiki>s [https://bit.ly/YNUkZB STOP] page. On one hand this led to more complete backups of an URL shortener, but on the other hand it made adding support for new shorteners more difficult.<br />
<br />
Also on October 26th, 2010, when tinyback developement was only getting started, soultcer asked SketchCow about putting up the scraped data for download on the urlte.am website. SketchCow approved, but it would still be a long time until the first release.<br />
<br />
File handling and shortcode assignment was done manually back then, with the resuling output files being copied around using scp before being sorted and merged with some rather slow C tools. Following the creation of tinyback, both chronomex and soultcer did lots of scraping using their respective tools. To get around IP bans, cheap low end VPS were used for scraping.<br />
<br />
=== First release: 2011/01 - 2011/06 ===<br />
==== is.gd changes ====<br />
On January 12th, 2011 ''is.gd'' migrated to a new architecture<ref name="isgd-news"/>. Shortcodes changed from sequential to random and a very strict 60 requests per minute limit was added. This made scraping more difficult, but luckily chronomex had fetched almost all the sequential shortcodes before the switch was made.<br />
<br />
==== Release preparation ====<br />
In January 2011 soultcer started an effort to combine the scraped data from chronomex and himself, which he finished in February. On February 9th, 2011 soultcer made the first commit to the urlteam-stuff repository, which holds the urlte.am website.<br />
<br />
Also in 2011 [[User:Jeroenz0r|Jeroenz0r]] and [[User:Underscor|underscor]] joined the urlteam and helped with scraping and various other stuff. In March 2011 swebb also discovered the IRC channel and uploaded his scraped data for soultcer to merge. His scrape of tr.im was especially useful, but more on that later.<br />
<br />
==== 301works cooperation ====<br />
In March 2011 Sketchcow arranged for Jeff Kaplan from [http://301works.org 301Works.org] to give soultcer an upload slot to the [http://archive.org/details/301works 301works collection on archive.org]. From March 15th to July 6th soultcer uploaded all data he had scraped from ''bit.ly'' so far in a csv-based format to the [http://archive.org/details/301utm 301utm collection]. The list of files and the codes they contain is also [https://github.com/ArchiveTeam/urlteam-stuff/blob/master/301works stored] in the urlteam-stuff git repository. Unfortunately no further updates to the data have been made after that point.<br />
<br />
==== Release ====<br />
In April 2011 soultcer stated on IRC that he wanted to create the release torrent, but he had trouble merging swebb's data from ''tr.im'' with his own scrapes. As it turned out, ''tr.im'' was very broken and returned some bad data, which is why the URLteam settled on only putting parts of the ''tr.im'' scrape in the torrent. After the contents were finalized in May 2011, underscor provided a server to upload the 40 GB of compressed data files that had been collected. The upload finished on May 31st, and underscor created a torrent from it, marking our very first release. It took another couple of days to update the homepage, but after over 2 years of scraping, the first results were finally available for download.<br />
<br />
=== Second release: 2011/06 - 2012/01 ===<br />
After the first release, most people were rather busy with other stuff, so when the self-imposed deadline of December 2011 approached, the only new files were a couple gigabytes of scraped data from ''tinyurl.com'' and the merged data from ''tr.im'' (see below). The release was created on the last day of 2011.<br />
<br />
==== tr.im (a short digression) ====<br />
In August 2009, when popular shortener ''bit.ly'' became the default shortener for Twitter, Eric Woodward, owner of the not quite so popular shortener ''tr.im'' was rather butthurt, and decided to shut down ''tr.im'' in spite.<ref name="trim-shutdown"/> This caused a massive uproar, because it made people realize that once ''tr.im'' shut down, millions of URLs would just stop working, or even worse, redirect to some spam site. This not only affected ''tr.im'', but undermined the claim for legetimazy for every other URL shortener as well. ''bit.ly'' offered to continue hosting ''tr.im'', but Eric Woodward was having none if it.<ref name="trim-offer"/><ref name="trim-refused-offer"/> In the end he reopened ''tr.im'' and a few days later announced it would live on as an open source project.<ref name="trim-reopen"/><ref name="trim-opensource"/> While he did release the source, nothing ever came of his "community-owned" URL shortener idea.<ref name="trim-source"/> Shortening of new URLs was disabled and redirecting barely worked, breaking when too many requests (like more than 5 per minute) were made. In March 2011 soultcer removed the Trim class from tinyback, effectively ceasing any further backup efforts.<ref name="git-tinyback"/><br />
<br />
Since the first release included not the full scraped that from tr.im, soultcer decided to do it right for the second release. In May 2011 he tried to merge the scrapes swebb and he had done, which turned out to be a complicated process. Using the source code released by Woodward, he was able to understand some of the weird quirks that tr.im had: Shorturl codes could either be autogenerated or custom codes. Autogenerated ones were case-sensitive, custom one were not. If a new autogenerated code was the same as a custom code, it might overwrite the custom code. Also, URLs would be randomly truncated for no understandable reason.<ref name="trim-source"/> With that (half-)knowledge, he pierced together a final backup of the ''tr.im'' shortener, which was included in the second release.<br />
<br />
=== A new approach: 2012/01 - 2013/01 ===<br />
After the second release, work on the URLTeam slowed down once more. Soultcer became unhappy with his ruby-based scraper and rewrote it in Python 3, which was not very widely used at the time and made character encoding handling more difficult. In August 2012 he rewrote tinyback again,b ased on is Python 3 version, but this time for Python 2. To distinguish the new version from the old Ruby tinyback, it was called tinyback v2.<br />
<br />
==== Tinyarchive database and tracker ====<br />
The process of creating the release was rather cumbersome and error-prone: URL shorteners and ranges were assigned manually to scraping hosts, using a text file for coordination. The results were tracked with git-annex, and then merged using unintuitive command-line tools. With the release of tinyback v2, soultcer also created a small tracker. It was was written in September 2012, also using Python 2, and sqlite3 as database backend. The tracker was responsible for handing out tasks to tinyback instances using a simple HTTP API, making sure that only one task for each URL shortener was handed out per IP address, to avoid IP blocks. The results were then uploaded back to the tracker.<br />
<br />
Previously all data was stored in sorted and unsorted text files, often compressed to save space. Using the name tinyarchive, soultcer created some tools to manage all scraped URLs in a database instead, using Python 2 and BerkeleyDB. The resuling database was from then used as the canonical method for storing URL shortener backups, with releases being generated directly from the database.<br />
<br />
==== Third release ====<br />
The planned release cycle of 6 months put the next release in June 2012, but work on the new tinyback and tinyarchive only started in August 2012. Almost no scraping was done beforehand, so we needed some time to scrape new data for the third release. The new tools sped up scraping, especially since the hurdle-of-entry was lowered by the automatic task assignment. The tinyback project was also added to the [[ArchiveTeam Warrior]].<br />
<br />
Preparation for the next release started mid-December and the release was created on January 1st, 2013.<br />
<br />
=== Fourth release: 2013/01 - now ===<br />
The new tinyback code made scraping easier, and as a result more people started doing so, even outside the Warrior. The code also received patches by [[User:Alard|alard]], [[User:Chfoo|chfoo]], [[User:Ersi|ersi]] and [[User:Joepie91|jopie91]] that fixed bugs, improved log output, and added support for new URL shorteners.<br />
<br />
==== tr.im relaunches ====<br />
On December 2012 soultcer discovered a posting on a job board looking for a programmer to work on a relaunch for ''tr.im''. The domain name had been brought by a domain name "investement comapny" (= domain squatter), but it was unclear if they had also acquired the database. On January 30th, 2013, ''tr.im'' was back online, and it turns out that the database had not been part of the deal. Shortcode generation was sequential, but with weird gaps in between, probably in an effort to leave many codes unused. Unused codes would automatically redirect to advertisement after a couple of seconds. To keep at least some of the original ''tr.im'' links alive, soultcer started submitting the data he had from the old ''tr.im'' into the new ''tr.im''. It was not a perfect solution, but at least a part of the links were preserved that way.<br />
<br />
==== Fourth Release ====<br />
On May 16th, 2013 soultcer announced that he would be stepping down from his role in the URLTeam after the following release. Chronomex and ersi volunteered to take over the duty of running the tracker, the tinyarchive database and creating the releases. The fourth release is planned for the beginning of July 2013.<br />
<br />
== Sources ==<br />
<references><br />
<ref name="trim-shutdown">http://mashable.com/2009/08/09/trim-shuts-down/</ref><br />
<ref name="trim-offer">http://mashable.com/2009/08/09/shorturl-savior/</ref><br />
<ref name="trim-refused-offer">http://techzinglive.com/page/101/techzing-13-tr-im-bit-ly-and-twitter-the-real-story</ref><br />
<ref name="trim-reopen">http://mashable.com/2009/08/11/trim-reopened/</ref><br />
<ref name="trim-opensource">http://mashable.com/2009/08/17/tr-im-community-owned/</ref><br />
<ref name="trim-source">While not available on Github anymore, soultcer still has a copy of the open sourced tr.im code</ref><br />
<ref name="isgd-news">http://is.gd/news.php</ref><br />
<ref name="archive.org-urlte.am">http://web.archive.org/web/20091227074236/http://urlte.am/</ref><br />
<ref name="whois-urlte.am">http://whois.domaintools.com/urlte.am</ref><br />
<ref name="wiki-urlteam">http://www.archiveteam.org/index.php?title=URLTeam&limit=500&action=history</ref><br />
<ref name="git-urlteam">https://github.com/chronomex/urlteam/commits/master</ref><br />
<ref name="git-tinyback">https://github.com/ArchiveTeam/tinyback/commits/master</ref><br />
</references></div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=16722URLTeam2013-05-20T20:25:19Z<p>Soult: </p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 10,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, done up to azzzzz<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 50,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 934,134,706 (2013-05-20)<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| Old tr.im<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| [http://tr.im/ New tr.im]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 42pzz<br />
| new shorturls: sequential<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| Cake at 19%. [https://dl.dropboxusercontent.com/u/672132/urlteam/visiblihex_incomplete_20130515.xz Incomplete ~2.7mil 59MB ]<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-04-17. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* adjix.com<br />
* ask.fm - ask.fm/a/40k05kgp<br />
* awe.sm<br />
* biglnk.com<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* burl.se<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dld.bz - "private URL shortening service"<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* flip.it - Flipboard<br />
* fnd.us (See offical shorteners)<br />
* go.to<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* korta.nu<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ph.ly Related to the pond called Philadelphia, where links are born and raised<br />
* po.st<br />
* r.ebay.com<br />
* rod.gs<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shar.es (See offical shorteners)<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrd.by - see sharedby.co<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
<br />
==== "Official" shorteners ====<br />
<br />
* bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)<br />
* CokeURL.com - Coca-Cola<br />
* db.tt - DropBox<br />
* di.sn - Disney<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )<br />
* hub.me - HubPages<br />
* igg.me - Indiegogo<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis] - 404 on homepage, otherwise ok<br />
* skfb.ly - Sketchfab<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* tmblr.co - Tumblr<br />
* wapo.st - Washington Post<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* 4sq.com - Foursquare<br />
* aje.me - Aljazeera<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* chzb.gr - Cheezeburger<br />
* conta.cc - Constant Contact Inc.<br />
* dennysd.in - Denny's Restaurants<br />
* dtoid.it - Destructoid<br />
* econ.st - The Economist<br />
* es.pn - ESPN<br />
* gaw.kr - Gawker<br />
* grd.to - The Grid TO<br />
* huff.to - Huffington Post<br />
* j.mp - bit.ly<ref>http://blog.bitly.com/post/179664996/go-ahead-and-j-mp</ref><br />
* jrnl.to - thejournal.ie<br />
* kck.st - Kickstarter<br />
* marsdd.it - MaRS Discovery District<br />
* nyti.ms - New York Times<br />
* onforb.es - Forbes<br />
* read.bi - Business Insider<br />
* rseo.co - realseo<br />
* slackers.co - slackers.com<br />
* s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly<br />
* stjo.es - St. Joseph Media<br />
* squid.us - Laughing Squid<br />
* tcrn.ch - Techcrunch<br />
* theatln.tc - The Atlantic<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=16718URLTeam2013-05-20T00:41:40Z<p>Soult: Update table</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 10,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, done up to azzzzz<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 50,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 934,134,706 (2013-05-20)<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| Old tr.im<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| [http://tr.im/ New tr.im]<br />
| ?<br />
| [[Warrior]]<br />
| scraping: sequential, done up to 42pzz<br />
| new shorturls: sequential<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| Cake at 19%. [https://dl.dropboxusercontent.com/u/672132/urlteam/visiblihex_incomplete_20130515.xz Incomplete ~2.7mil 59MB ]<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-04-17. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* adjix.com<br />
* ask.fm - ask.fm/a/40k05kgp<br />
* awe.sm<br />
* biglnk.com<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* burl.se<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dld.bz - "private URL shortening service"<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* flip.it - Flipboard<br />
* fnd.us (See offical shorteners)<br />
* go.to<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* korta.nu<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ph.ly Related to the pond called Philadelphia, where links are born and raised<br />
* po.st<br />
* r.ebay.com<br />
* rod.gs<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shar.es (See offical shorteners)<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrd.by - see sharedby.co<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
<br />
==== "Official" shorteners ====<br />
<br />
* bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)<br />
* CokeURL.com - Coca-Cola<br />
* db.tt - DropBox<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )<br />
* hub.me - HubPages<br />
* igg.me - Indiegogo<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis] - 404 on homepage, otherwise ok<br />
* skfb.ly - Sketchfab<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* tmblr.co - Tumblr<br />
* wapo.st - Washington Post<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* 4sq.com - Foursquare<br />
* aje.me - Aljazeera<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* chzb.gr - Cheezeburger<br />
* conta.cc - Constant Contact Inc.<br />
* dennysd.in - Denny's Restaurants<br />
* dtoid.it - Destructoid<br />
* econ.st - The Economist<br />
* es.pn - ESPN<br />
* gaw.kr - Gawker<br />
* grd.to - The Grid TO<br />
* huff.to - Huffington Post<br />
* j.mp - bit.ly<ref>http://blog.bitly.com/post/179664996/go-ahead-and-j-mp</ref><br />
* jrnl.to - thejournal.ie<br />
* kck.st - Kickstarter<br />
* marsdd.it - MaRS Discovery District<br />
* nyti.ms - New York Times<br />
* onforb.es - Forbes<br />
* read.bi - Business Insider<br />
* rseo.co - realseo<br />
* slackers.co - slackers.com<br />
* s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly<br />
* stjo.es - St. Joseph Media<br />
* squid.us - Laughing Squid<br />
* tcrn.ch - Techcrunch<br />
* theatln.tc - The Atlantic<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=16134URLTeam2013-04-17T09:56:12Z<p>Soult: Repository move</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== Who did this? ==<br />
You can join us in our IRC channel: [irc://irc.efnet.org/urlteam #urlteam] on [http://www.efnet.org/ EFNet]<br />
* [[User:Scumola]] started this wiki page<br />
* [[User:Chronomex]] started the Urlteam scraping effort<br />
* [[User:Soult]] Helps with scraping<br />
* [[User:Jeroenz0r]] Helps with scraping (and stalking Soult)<br />
* ... many ArchiveTeam people who run the scrapers<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/ArchiveTeam/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/ArchiveTeam/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 1,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, <= 6 characters<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 4,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 810,264,745 (2013-01-30)<br />
| [[Warrior]]<br />
| scraping: sequential, <= 5 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| [http://tr.im tr.im]<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| adjix.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Already done: 00-zz, 000-zzz, 0000-izzz.<br />
| case-insensitive, incremental<br />
|-<br />
| rod.gs<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 00-ZZ, 000-2Qc<br />
| case-sensitive, incremental, server can't keep up with all the requests.<br />
|-<br />
| biglnk.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 0-Z, 00-ZZ, 000-ZZZ<br />
| case-sensitive, incremental<br />
|-<br />
| go.to<br />
| 60000<br />
| [[User:Asiekierka]]<br />
| Done: ~45000 (go.to network links only: [http://64pixels.org/goto_dump.zip goto_dump.zip])<br />
| no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less<br />
|-<br />
| visibli (hex)<br />
| 16777216<br />
| [[User:Chfoo]]<br />
| In progress. Estimated completion date: 2013-07-19.<br />
| Using links.sharedby.co/links/ as URL prefix.<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2013-02-13. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* adf.ly<br />
* awe.sm<br />
* budurl.com - Appears non-incremental<br />
* buff.ly - Buffer App<br />
* cli.gs - Appears non-incremental<br />
* cl.ly - CloudApp<br />
* decenturl.com - Not at all easy to scrape.<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* po.st<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tr.im (2nd generation)<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* ur1.ca - Database is downloadable from website directly.<br />
* urlcut.com<br />
* vimeo.com<br />
* xrl.us - see metamark.net<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
* korta.nu<br />
* burl.se<br />
* fnd.us (See offical shorteners)<br />
* shar.es (See offical shorteners)<br />
<br />
==== "Official" shorteners ====<br />
* db.tt - DropBox<br />
* fb.me - Facebook<br />
* flic.kr - Flickr<br />
* fnd.us - [http://fundrazr.com Fundrazr.com]<br />
* goo.gl - Google<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
* lnkd.in - LinkedIn<br />
* post.ly - Posterous<br />
* shar.es - [http://sharethis.com ShareThis]<br />
* spoti.fi - [http://spotify.com Spotify]<br />
* stanford.io - Stanford University<br />
* su.pr - StumbleUpon<br />
* t.co - Twitter<br />
* wp.me - Wordpress.com<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
<br />
===== bit.ly aliases =====<br />
<br />
* 1.usa.gov - USA Government<br />
* theatln.tc - The Atlantic<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* gaw.kr - Gawker<br />
* nyti.ms - New York Times<br />
* tcrn.ch - Techcrunch<br />
* usat.ly - USA Today Newspaper<br />
* vrge.co - The Verge<br />
<br />
=== Dead or Broken ===<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* jump.to - dead as of February 1, 2013<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kl.am - "kl.am Closes its Shell"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im (1st generation) - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Sbot&diff=14416User:Sbot2013-04-14T01:51:09Z<p>Soult: Created page with "Sbot is soult's helper bot. It is mainly used to delete spam. It has the bot flag so it doesn't clutter the recent changes page."</p>
<hr />
<div>Sbot is [[User:Soult|soult]]'s helper bot. It is mainly used to delete spam. It has the bot flag so it doesn't clutter the recent changes page.</div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist&diff=12027MediaWiki:Spam-blacklist2013-04-07T13:48:53Z<p>Soult: </p>
<hr />
<div> # External URLs matching this list will be blocked when added to a page.<br />
# This list affects only this wiki; refer also to the global blacklist.<br />
# For documentation see http://www.mediawiki.org/wiki/Extension:SpamBlacklist<br />
#<!-- leave this line exactly as it is --> <pre><br />
#<br />
# Syntax is as follows:<br />
# * Everything from a "#" character to the end of the line is a comment<br />
# * Every non-blank line is a regex fragment which will only match hosts inside URLs<br />
# * ^.* and .*$ make it so that only domains are matched, not full URLs<br />
<br />
# Spam terms<br />
best-?deal<br />
attsystems<br />
(car|health|life)-?insurance<br />
christian-?louboutin<br />
discount<br />
electronic-?cigarette<br />
fast-?money<br />
hotels?-?(booking|discount)<br />
jersey<br />
jordan<br />
ketonic<br />
lingerie<br />
loan<br />
lottery<br />
money<br />
outlet<br />
penis-?enlargement<br />
sex-?chat<br />
weight-?(loss|gain)<br />
<br />
# Blogging/community sites which are bad at filtering spam<br />
aolanswers\.com<br />
\.beeplog\.com<br />
\.blog\.ca<br />
blogspace\.fr<br />
diigo\.com<br />
doomby\.com<br />
foodbuzz\.com<br />
gameinformer\.com<br />
insanejournal\.com<br />
jeteye\.com<br />
livelogcity\.com<br />
retrogamer\.net<br />
sitesays\.com<br />
statigr\.am<br />
tagged\.com<br />
\.tumblr\.com # Requires account to report spam blogs. No thank you.<br />
\.xanga\.com<br />
<br />
# Hacked websites where the abuse departement does not care<br />
asu\.edu<br />
ncsu\.edu<br />
scu\.edu # Has no abuse contact at all<br />
<br />
# Spam Domains<br />
123finances\.eu<br />
1remodelingchicago\.com<br />
247digitallearning\.com<br />
3825\.co.uk<br />
5x5workouts\.net<br />
ableton-serato\.com<br />
abubbleshooter\.info<br />
accountingdegree101.com<br />
adidasjeremyscottwings\.com<br />
adral\.eu<br />
air-conditioner-reviews\.info<br />
akerpub\.com<br />
alonerank\.com<br />
ameritrustshield\.com<br />
angelweddingdress\.com<br />
antivirusfirewallsoftwaresite\.org<br />
aremyhair\.com<br />
asiaone\.com<br />
askmehelpdesk\.com<br />
australias\.com\.au<br />
autostoreplus\.com<br />
babygearland\.com<br />
backup-4\.com<br />
beats-bydre\.net<br />
beijingsensualmassage\.com<br />
bestbuylouisvuitton\.com<br />
bestdatinglink\.com<br />
bestonlinebuys\.net<br />
bestseoagency\.net<br />
bestwebsitedesigncompanies\.net<br />
bioactives-morinda\.com<br />
bizeso\.com<br />
bizscribes\.com<br />
bodyjewelleryshop\.com<br />
bubbleshooteronline\.info<br />
buybeatsbydre\.com<br />
buycarhartt\.net<br />
cabbagesoupdiett\.com<br />
calculette-pret-immobilier\.fr<br />
casinoinfo\.pl<br />
cedarmulch\.net<br />
classvogue\.com<br />
clickbank\.net<br />
colorstrokespainters\.com\.au<br />
conservatoryprices0ob\.net<br />
creepers\.ch<br />
czarymary\.pl<br />
debtsadvice\.net<br />
demo-download\.org<br />
dentists-atlanta\.net<br />
doubleglazeddoorslj9j\.org<br />
download-yahoo-messenger\.net<br />
dress-sense\.sg<br />
dressup24h\.com<br />
easypret\.fr<br />
ebutcherblockcountertops\.com<br />
efusiontech\.com<br />
emergencypreparednesshelp\.net<br />
empowerbpo\.com<br />
e-najlepsza-lokata\.com<br />
everylight\.co\.uk<br />
eyelashcurlers\.org<br />
ezdi\.us<br />
fabricacurtain\.com<br />
felicitysglutenfreehandbook\.com<br />
femmefatalehats\.com<br />
finanziellen-freiraum\.de<br />
findingnola\.com<br />
for-htc\.com<br />
free-drug-rehab\.com<br />
furnitureforbathroom\.co\.uk<br />
gadgetinthebox\.com<br />
garminnuviportablegps\.com<br />
gazeta\.pl<br />
geek-lamp\.com<br />
get-free-diapers\.com<br />
getgplusvotes\.com<br />
glutenfreerecipebox\.com<br />
goodreads\.com<br />
googlelocalranking\.com<br />
gpgyjr\.com\.cn<br />
guanacaste\.net<br />
hairagainreviews\.org<br />
hatena\.com<br />
hdrolx\.com<br />
hermesfair\.com<br />
hervelegerdressonline2012\.com<br />
hi5mediagroup\.com<br />
hostingreviewsandcoupons\.com<br />
house-maintain\.blogspot\.com<br />
i-am-adopted\.com<br />
iii\.org\.tw<br />
iinfobase.\com<br />
inboxbuddy\.com<br />
instantcashloanforme\.com<br />
ipadaccessoriesale\.uk\.com<br />
itsakon\.com<br />
iusetbellum\.blogspot\.com<br />
jasa-seo\.org<br />
juegosgratis1\.info<br />
jukeboxalive\.com<br />
jumbobookmarks\.com<br />
keepandshare\.com<br />
kinderbadeshop\.de<br />
kingswaylifecare\.com<br />
kokosowy\.pl<br />
lagbook\.com<br />
lawn-edging\.com<br />
legalsoundz\.com<br />
leonisawesome\.com<br />
letusreckon\.com<br />
lolliboys\.com<br />
louisvuittonbagsroom\.com<br />
louisvuittonhandbagsstore\.co\.uk<br />
macipadvideo\.us\.com<br />
macobserver\.com<br />
marketingmassachusetts\.net<br />
marvelousessays\.com<br />
masscouponsubmitter\.com<br />
matelasbonheur\.ca<br />
mbk-center\.com<br />
mebletarnów\.com.pl<br />
mediscribes\.com<br />
meridianstars\.com<br />
merritts\.uk\.com<br />
michaeljackson-songs\.org<br />
miedziaki\.eu<br />
mmichalekellya\.xanga\.com<br />
moslemunity\.com<br />
mp3in1\.com<br />
mulchinglawn\.com<br />
my-landimmo\.de<br />
n2acards\.com<br />
naturalinfertilitytreatments\.wordpress.com<br />
needrapidcashnow\.com<br />
net-promotion\.pl<br />
newerahats2012\.com<br />
new-rap-songs\.com<br />
newyorkgiantsnikejerseysstore\.com<br />
nibtv\.com<br />
nieruchomościtarnów\.com.pl<br />
onestopbookmarks\.com<br />
onlineprnews\.com<br />
ovidiusilaghi\.ro<br />
pandatarot\.com<br />
patiodoorsda5\.com<br />
pisaniecv\.info<br />
pixelperfectsoftworks\.com<br />
popculturedivas\.com<br />
power-leveling\.us<br />
practutor\.com<br />
prnewswire\.com<br />
product-samples\.net<br />
professays\.com<br />
prosdi\.com<br />
purevolume\.com<br />
qiel\.com<br />
r220\.cc<br />
rajpromotions\.com<br />
rapidfbfans\.com<br />
recoverytoolbox\.com<br />
redditmarketing\.com<br />
repairtoolbox\.com<br />
retiringincostarica\.org<br />
rulettstrategiak\.com<br />
sarasotacriminalattorneys\.com<br />
sbwire\.com<br />
scheidungohneanwalt\.com<br />
schoolgrantsguides\.blogspot.com<br />
scrapebrokers\.com<br />
self-defensesupply\.com<br />
seogooglemaps\.net<br />
seo-methods\.com<br />
seopackagepricing\.com<br />
showingoncam\.com<br />
smartphonewebcreator\.com<br />
sovcal\.com<br />
spittingandvomitting\.com<br />
sports-camping\.com<br />
squidoo\.com<br />
superiorpapers\.com<br />
supremeessays\.com<br />
surfbrands\.net<br />
symbian-kreatif\.co\.cc<br />
szybki-kredyt-bez-bik\.com<br />
tabletpcwarehouse\.net<br />
tani-kredyt-mieszkaniowy\.org<br />
taoholycity\.com<br />
televisionspain\.net<br />
theprivatenetwork\.net<br />
thesecretworldhack\.com<br />
thespainforum\.com<br />
theuniquehoodiasite\.com<br />
ticketforeverything\.com<br />
tjindustrial\.com<br />
tn-?requin-?paschers\.(biz|eu)<br />
trustbanq\.com<br />
urbanspycam\.com<br />
usalouisvuittonshopping\.com<br />
vacationscostarica\.com<br />
wallmountingatv\.com<br />
watchnflgamesonlinehd\.com<br />
webdesignsanluisobispo\.wordpress.com<br />
webousb\.com<br />
wedding-cake-decorations\.net<br />
wedding-cake-stands\.net<br />
wheretogetengaged\.com<br />
wholesaledefenseonline\.com<br />
whyimhotter\.com<br />
whyonlinebackup\.com<br />
wordpressseoexpert\.com<br />
worldselectshop\.com<br />
wylinka\.com<br />
xaby\.com<br />
xg4ken\.com<br />
yaseminler\.com<br />
zay\.pl<br />
ziel-motivation\.com<br />
zlewozmywak24\.pl<br />
<br />
#</pre> <!-- leave this line exactly as it is --></div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Deletereason-dropdown&diff=10345MediaWiki:Deletereason-dropdown2013-03-27T11:22:47Z<p>Soult: Created page with "*Common delete reasons ** Author request ** Copyright violation ** Vandalism ** Spam"</p>
<hr />
<div>*Common delete reasons<br />
** Author request<br />
** Copyright violation<br />
** Vandalism<br />
** Spam</div>Soulthttps://wiki.archiveteam.org/index.php?title=Early_projects&diff=10009Early projects2013-03-22T17:29:29Z<p>Soult: Update URLTeam release</p>
<hr />
<div>[[File:Archiveteamlogo.png|right|link=http://www.archive.org/details/archiveteam|Look at Archive Team Collection at Internet Archive too]]<br />
Some '''archives''' available for downloading, by [[Archive Team]] or by other volunteers or groups. Sorted by size.<br />
<br />
Look at [http://www.archive.org/details/archiveteam Archive Team Collection] at Internet Archive too.<br />
<br />
If you have archived any site, you can add a link to the table [http://archiveteam.org/index.php?title={{PAGENAMEE}}&action=edit editing this page] (or just drop a line in [http://chat.efnet.org:9090/?nick=&channels=%23archiveteam&Login=Login our IRC channel] and we will add it).<br />
__NOTOC__<br />
== Available for download ==<br />
<center><br />
{| width=1000px class="wikitable" style="text-align: center;"<br />
|-<br />
! width=300px | Title/Download link<br />
! Description<br />
! width=80px | Size<br />
|-<br />
| [http://thepiratebay.org/torrent/6353395/Geocities_-_The_PATCHED_Torrent Geocities - The PATCHED Torrent] ([http://www.archive.org/search.php?query=archive%20team%20geocities%20snapshot IA]) || The [[Geocities|popular web hosting]] service founded in 1994. It was closed by Yahoo! in 2009 || 641.4 GB<br />
|-<br />
| [http://urlte.am/releases/2013-01-02/urlteam.torrent URL Shortener Backup Torrent v3] || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2013-01-02/README.txt README]) || 50 GB<br />
|-<br />
| [http://urlte.am/releases/2011-12-31/urlteam.torrent URL Shortener Backup Torrent v2] '''outdated, use v2''' || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2011-12-31/README.txt README]) || 48 GB<br />
|-<br />
| [http://urlte.am/releases/2011-05-31/urlteam.torrent URL Shortener Backup Torrent v1] '''outdated, use v3''' || [[URLTeam]] compressed backups of various URL shorteners ([http://urlte.am/releases/2011-05-31/README.txt README]) || 41.1 GB<br />
|-<br />
| [http://thepiratebay.org/torrent/6554331/Papers_from_Philosophical_Transactions_of_the_Royal_Society__fro Papers from Philosophical Transactions of the Royal Society] || This archive contains 18,592 scientific publications totaling 33GiB, all from Philosophical Transactions of the Royal Society and which should be available to everyone at no cost, but most have previously only been made available at high prices through paywall gatekeepers like JSTOR. || 32.48 GB<br />
|-<br />
| [http://www.archive.org/details/2011-05-calufa-twitter-sql The May 2011 Calufa Twitter Scrape] || 90+ million [[tweets]] from more than 6 million users || 14.9 GB<br />
|-<br />
| [http://torrent.ibiblio.org/doc/181 Internet Gopher Archive 2007] ([http://www.archive.org/details/2007-gopher-mirror IA]) || Archive of [[gopher]] sites || 14.8 GB<br />
|-<br />
| [http://www.archive.org/details/2010-01-encyclopedia-dramatica Encyclopedia Dramatica January 2010 Mirror] || [[lulz]] || 11.7 GB<br />
|-<br />
| [http://www.archive.org/details/textfiles-dot-com-2011 The TEXTFILES.COM Time Capsule] || This collection comprises all the major text-based sets of the [[TEXTFILES.COM]] site || 11 GB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-tabletalk-panic Salon Table Talk] || Threads of this talk site || +6.0 GB<br />
|-<br />
| [http://www.archive.org/details/utzoo-wiseman-usenet-archive Usenet Archive of UTZOO Tapes] || Collection of .TGZ files of very early USENET posted data || 2.0 GB<br />
|-<br />
| [http://torrent.ibiblio.org/doc/182 Quux.org Gopher Mirror Collection 2006] ([http://www.archive.org/details/quux-gopher-mirror IA]) || This is a collection of mirrors maintained by gopher.quux.org. These mirrors were taken offline in 2006 due to bandwidth constraints || 1.5 GB<br />
|-<br />
| [http://burnbit.com/torrent/174605/full_history_linux_git_tar full-history-linux.git.tar] || GIT repository of Linux Kernel from 1991 to 2010 ([http://lwn.net/Articles/285366/ details]) || 594 MB<br />
|-<br />
| [http://www.archive.org/details/twitter_cikm_2010 Cheng-Caverlee-Lee September 2009 - January 2010 Twitter Scrape] || Almost 10 million [[tweets]] || 425 MB<br />
|-<br />
| [http://www.archive.org/details/2010-reddit-research The 2010 Reddit Research Project] || Dataset on affinities of 60,000+ [[Reddit]] users, recorded in 2010 || ~360 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-starwars-yahoo Archive Team Starwars.Yahoo.Com Panic Download] || This is a panic download of the [[starwars.yahoo.com]] forums and profiles, done before the closure of same by Yahoo on December 15, 2009. This includes as many messages, profiles, and pages related to the site as could be easily brought in. || ~250 MB<br />
|-<br />
| [http://www.archive.org/details/oxford-2005-facebook-matrix Social Structure of Facebook Networks Facebook Data Scrape] || [[Facebook]] data scrape related to paper "The Social Structure of Facebook Networks", by Amanda L. Traud, Peter J. Mucha, Mason A. Porter || 197 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-etherpad-timecapsule Archive Team's Etherpad Time Capsule] || This archive contains roughly 6,400 [[Etherpad]]s, in their final state || 125 MB<br />
|-<br />
| [http://code.google.com/p/wikiteam/downloads/list WikiTeam archives] || Archives about [[wikis]]. See [[WikiTeam]] || +100 MB<br />
|-<br />
| [http://www.archive.org/details/ArchiveTeamsiteRip Archive Team] || Archive Team.org Site Rip from August 03, 2011 || 75 MB<br />
|-<br />
| [http://www.archive.org/details/boingboing-2000-2005 Boing Boing Posts Archive (2000-2011)] || Two collections of [[Boing Boing]] postings provided by the cultural website boingboing.net on its 5th and 11th anniversaries || 42 MB<br />
|-<br />
| [http://www.archive.org/details/archiveteam-quotes-archive-2011-04 Archive Team Quotes Database Backup] || Amusing snatches of conversation from [[IRC]] and other online gathering places || 5 MB<br />
|- <br />
| [https://sites.google.com/site/archiveofstuff/home/localroger.com.7z Mirror of Revelation Passage Series Website] || wget of a small author's website. || ~500kb<br />
|-<br />
| [http://www.archive.org/details/archiveteam-powerblogs-2010-11-snapshot Archive Team Powerblogs Shutdown Snapshot] || This is a 108-blog snapshot of the final month of [[Powerblogs]], before their shutdown || ? <br />
|-<br />
| [http://www.archive.org/details/bbc-panic-closing-archives BBC Closing Panic Archives] || Some [[BBC]] sites || ? <br />
|-<br />
| [http://archive.org/details/stillflying.net-20120905-mirror stillflying.net] || A firefly fan fiction site that maded the rest of season 1 and season 2 pdf scripts for what would have been if firefly wasn't canceled. || 408.1mb <br />
|-<br />
| colspan=2 | '''Total size''' || ~692 GB<br />
|}<br />
</center><br />
<br />
== Archived but not available ==<br />
* [[Google Video]]<br />
* [[Yahoo! Videos]]<br />
<br />
== See also ==<br />
* [[Projects]]<br />
** [[:Category:Rescued Sites]]<br />
<br />
== External links ==<br />
* http://www.archive.org/details/archiveteam<br />
* http://thepiratebay.org/user/archiveteam<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]<br />
[[Category:Rescued Sites]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9456Posterous2013-02-26T11:28:36Z<p>Soult: typo</p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
| tracker = [http://tracker.archiveteam.org/posterous/ here]<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Warrior ==<br />
You can help by installing and running the [[ArchiveTeam Warrior]] and selecting the "posterous" project. Warning: Posterous will most likely ban you after a couple of hours, which means you won't be able to visit Posterous unless you can change your IP address.<br />
<br />
== Seesaw script (for advanced users)==<br />
<br />
'''Download:'''<br />
<br />
git clone https://github.com/ArchiveTeam/posterous-grab.git<br />
<br />
Follow instructions to install seesaw and edit script for IP address.<br />
<br />
For wget: run ./get-wget-lua.sh<br />
<br />
Running too many concurrently will get you banned at :50 past the hour.<br />
<br />
'''Commands:'''<br />
<br />
Make sure you place an IP address after --bind-address= on line 175. Example: "--bind-address=192.168.1.1",<br />
git clone http://github.com/ArchiveTeam/posterous-grab.git<br />
cd posterous-grab<br />
git clone http://github.com/ArchiveTeam/seesaw-kit<br />
cd seesaw-kit<br />
sudo pip install -r requirements.txt<br />
sudo pip install seesaw<br />
cd ../<br />
chmod +x get-wget-lua.sh && ./get-wget-lua.sh<br />
run-pipeline --concurrent 1 --address <your_ip_address> pipeline.py <your_username><br />
<br />
== Site List Grab ==<br />
<br />
We have assembled a list of Posterous sites that need grabbing. Total found: 9898986<br />
<br />
http://archive.org/details/2013-02-22-posterous-hostname-list<br />
<br />
Tools: [https://github.com/ArchiveTeam/smeg git]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9455Posterous2013-02-26T11:27:53Z<p>Soult: </p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
| tracker = [http://tracker.archiveteam.org/posterous/ here]<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Warrior ==<br />
You can help by installing and running the [[Archiveteam Warrior]] and selecting the "posterous" project. Warning: Posterous will most likely ban you after a couple of hours, which means you won't be able to visit Posterous unless you can change your IP address.<br />
<br />
== Seesaw script (for advanced users)==<br />
<br />
'''Download:'''<br />
<br />
git clone https://github.com/ArchiveTeam/posterous-grab.git<br />
<br />
Follow instructions to install seesaw and edit script for IP address.<br />
<br />
For wget: run ./get-wget-lua.sh<br />
<br />
Running too many concurrently will get you banned at :50 past the hour.<br />
<br />
'''Commands:'''<br />
<br />
Make sure you place an IP address after --bind-address= on line 175. Example: "--bind-address=192.168.1.1",<br />
git clone http://github.com/ArchiveTeam/posterous-grab.git<br />
cd posterous-grab<br />
git clone http://github.com/ArchiveTeam/seesaw-kit<br />
cd seesaw-kit<br />
sudo pip install -r requirements.txt<br />
sudo pip install seesaw<br />
cd ../<br />
chmod +x get-wget-lua.sh && ./get-wget-lua.sh<br />
run-pipeline --concurrent 1 --address <your_ip_address> pipeline.py <your_username><br />
<br />
== Site List Grab ==<br />
<br />
We have assembled a list of Posterous sites that need grabbing. Total found: 9898986<br />
<br />
http://archive.org/details/2013-02-22-posterous-hostname-list<br />
<br />
Tools: [https://github.com/ArchiveTeam/smeg git]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9372Posterous2013-02-18T12:01:10Z<p>Soult: finished 5000000-5999999</p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Site List Grab ==<br />
<br />
We are currently assembling a list of Posterous sites that need grabbing. Development is seat-of-the-pants-y right now, and the following instructions will get your IP banned fairly quickly. Join us in #preposterus on efnet for state-of-the-art chitchat.<br />
<br />
=== Instructions ===<br />
Download the latest script: [https://github.com/ArchiveTeam/smeg git] <br />
<br />
Claim a number range in the table below<br />
<br />
Run 100 smegs concurrently. The following example will run the 1-2 million range:<br />
<br />
for chunk in $(seq 100 199); do ./smeg $chunk & done<br />
<br />
Running this with the python variant at a high scale WILL cause database lock collisions.<br />
<br />
To see hostnames as they're found:<br />
<br />
tail -q -n 0 -f *.hostnames<br />
<br />
No output means you're IP banned.<br />
<br />
=== Range Claim ===<br />
{| border = 1<br />
| '''Range'''<br />
| '''Chunk(s)'''<br />
| '''User'''<br />
| '''Status'''<br />
| '''Uploaded Hostnames'''<br />
|-<br />
| 1 - 999,999<br />
| 1-99<br />
| closure<br />
| Done (742846)<br />
| archived<br />
|-<br />
| 1,000,000 - 1,999,999<br />
| 100-199<br />
| db48x / closure<br />
| Done (994303)<br />
| archived<br />
|-<br />
| 2,000,000 - 2,009,999<br />
| 200<br />
| aggroskater<br />
| Done (8907)<br />
| [https://dl.dropbox.com/u/67912136/2000000.hostnames.gz 2000000.hostnames.gz] archived<br />
|-<br />
| 2,010,000 - 2,019,999<br />
| 201<br />
| aggroskater<br />
| Done (8094)<br />
| [https://dl.dropbox.com/u/67912136/2010000.hostnames.gz 2010000.hostnames.gz] archived<br />
|-<br />
| 2,020,000 - 2,999,999<br />
| 202-299<br />
| dcmorton<br />
| Downloading<br />
|-<br />
| 3,000,000 - 3,999,999<br />
| 300-399<br />
| closure<br />
| Done (928023)<br />
| archived<br />
|-<br />
| 4,000,000 - 4,999,999<br />
| 400-499<br />
| chazchaz101<br />
| Downloading<br />
|<br />
|-<br />
| 5,000,000 - 5,999,999<br />
| 500-599<br />
| Smiley / Soult<br />
| Done (984360)<br />
| [http://helo.nodes.soultcer.com/posterous/5000000-5999999.hostnames.gz 5000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/5000000-5999999.sqlite.gz 5000000.sqlite.gz] <br />
|-<br />
| 6,000,000 - 6,999,999<br />
| 600-699<br />
| dcmorton<br />
| Downloading<br />
|<br />
|-<br />
| 7,000,000 - 7,999,999<br />
| 700-799<br />
| balrog<br />
| Partial (39462)<br />
| archived<br />
|-<br />
| 7,905,000 - 7,909,999<br />
| 790<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,915,000 - 7,919,999<br />
| 791<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,925,000 - 7,929,999<br />
| 792<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,935,000 - 7,939,999<br />
| 793<br />
| yipdw<br />
| Downloading<br />
|<br />
|-<br />
| 8,000,000 - 8,999,999<br />
| 800-899<br />
| beardicus/Soult<br />
| Done (984258)<br />
| [http://helo.nodes.soultcer.com/posterous/8000000-8999999.hostnames.gz 8000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/8000000-8999999.sqlite.gz 8000000.sqlite.gz] archived<br />
|-<br />
| 9,000,000 - 9,999,999<br />
| 900-999<br />
| GLaDOS<br />
| Downloading<br />
|<br />
|-<br />
| 10,000,000 - 10,019,999<br />
| 1000-1001<br />
| <span style="color:#c0c0c0">(Your name here!)</span><br />
| Partial<br />
| [http://posterous.archivingyoursh.it/10000000.hostnames.gz 10000000.hostnames.gz]<br />
|-<br />
| 10,020,000 - 10,069,999<br />
| 1002-1006<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,070,000 - 10,209,999<br />
| 1007-1020<br />
| flaushy<br />
| Downloading<br />
|<br />
|-<br />
| 10,210,000 - 10,309,999<br />
| 1021-1030<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,310,000 - 10,409,999<br />
| 1031-1040<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,410,000 - 10,509,999<br />
| 1041-1050<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,510,000 - 10,609,999<br />
| 1051-1060<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,610,000 - 10,709,999<br />
| 1061-1070<br />
| siliconvalleypark<br />
| Downloading<br />
|<br />
|-<br />
| 10,710,000 - 11,009,999<br />
| 1071-1100<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|}</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9369Posterous2013-02-18T04:33:32Z<p>Soult: </p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Site List Grab ==<br />
<br />
We are currently assembling a list of Posterous sites that need grabbing. Development is seat-of-the-pants-y right now, and the following instructions will get your IP banned fairly quickly. Join us in #preposterus on efnet for state-of-the-art chitchat.<br />
<br />
=== Instructions ===<br />
Download the latest script: [https://github.com/ArchiveTeam/smeg git] <br />
<br />
Claim a number range in the table below<br />
<br />
Run 100 smegs concurrently. The following example will run the 1-2 million range:<br />
<br />
for chunk in $(seq 100 199); do ./smeg $chunk & done<br />
<br />
Running this with the python variant at a high scale WILL cause database lock collisions.<br />
<br />
To see hostnames as they're found:<br />
<br />
tail -q -n 0 -f *.hostnames<br />
<br />
No output means you're IP banned.<br />
<br />
=== Range Claim ===<br />
{| border = 1<br />
| '''Range'''<br />
| '''Chunk(s)'''<br />
| '''User'''<br />
| '''Status'''<br />
| '''Uploaded Hostnames'''<br />
|-<br />
| 1 - 999,999<br />
| 1-99<br />
| closure<br />
| Done (742846)<br />
| archived<br />
|-<br />
| 1,000,000 - 1,999,999<br />
| 100-199<br />
| db48x / closure<br />
| Downloading<br />
|<br />
|-<br />
| 2,000,000 - 2,009,999<br />
| 200<br />
| aggroskater<br />
| Done (8907)<br />
| [https://dl.dropbox.com/u/67912136/2000000.hostnames.gz 2000000.hostnames.gz] archived<br />
|-<br />
| 2,010,000 - 2,019,999<br />
| 201<br />
| aggroskater<br />
| Done (8094)<br />
| [https://dl.dropbox.com/u/67912136/2010000.hostnames.gz 2010000.hostnames.gz] archived<br />
|-<br />
| 2,020,000 - 2,999,999<br />
| 202-299<br />
| dcmorton<br />
| Downloading<br />
|-<br />
| 3,000,000 - 3,999,999<br />
| 300-399<br />
| closure<br />
| Done (928023)<br />
| archived<br />
|-<br />
| 4,000,000 - 4,999,999<br />
| 400-499<br />
| chazchaz101<br />
| Downloading<br />
|<br />
|-<br />
| 5,000,000 - 5,999,999<br />
| 500-599<br />
| Smiley / Soult<br />
| incomplete (73874)<br />
| Downloading<br />
|-<br />
| 6,000,000 - 6,999,999<br />
| 600-699<br />
| dcmorton<br />
| Downloading<br />
|<br />
|-<br />
| 7,000,000 - 7,999,999<br />
| 700-799<br />
| balrog<br />
| Partial (39462)<br />
| archived<br />
|-<br />
| 7,905,000 - 7,909,999<br />
| 790<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,915,000 - 7,919,999<br />
| 791<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,925,000 - 7,929,999<br />
| 792<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,935,000 - 7,939,999<br />
| 793<br />
| yipdw<br />
| Downloading<br />
|<br />
|-<br />
| 8,000,000 - 8,999,999<br />
| 800-899<br />
| beardicus/Soult<br />
| Done (984258)<br />
| [http://helo.nodes.soultcer.com/posterous/8000000-8999999.hostnames.gz 8000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/8000000-8999999.sqlite.gz 8000000.sqlite.gz] archived<br />
|-<br />
| 9,000,000 - 9,999,999<br />
| 900-999<br />
| GLaDOS<br />
| Downloading<br />
|<br />
|-<br />
| 10,000,000 - 10,019,999<br />
| 1000-1001<br />
| <span style="color:#c0c0c0">(Your name here!)</span><br />
| Partial<br />
| [http://posterous.archivingyoursh.it/10000000.hostnames.gz 10000000.hostnames.gz]<br />
|-<br />
| 10,020,000 - 10,069,999<br />
| 1002-1006<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,070,000 - 10,209,999<br />
| 1007-1020<br />
| flaushy<br />
| Downloading<br />
|<br />
|-<br />
| 10,210,000 - 10,309,999<br />
| 1021-1030<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,310,000 - 10,409,999<br />
| 1031-1040<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,410,000 - 10,509,999<br />
| 1041-1050<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,510,000 - 10,609,999<br />
| 1051-1060<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,610,000 - 10,709,999<br />
| 1061-1070<br />
| siliconvalleypark<br />
| Downloading<br />
|<br />
|-<br />
| 10,710,000 - 11,009,999<br />
| 1071-1100<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|}</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9368Posterous2013-02-18T04:27:14Z<p>Soult: </p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Site List Grab ==<br />
<br />
We are currently assembling a list of Posterous sites that need grabbing. Development is seat-of-the-pants-y right now, and the following instructions will get your IP banned fairly quickly. Join us in #preposterus on efnet for state-of-the-art chitchat.<br />
<br />
=== Instructions ===<br />
Download the latest script: [https://github.com/ArchiveTeam/smeg git] <br />
<br />
Claim a number range in the table below<br />
<br />
Run 100 smegs concurrently. The following example will run the 1-2 million range:<br />
<br />
for chunk in $(seq 100 199); do ./smeg $chunk & done<br />
<br />
Running this with the python variant at a high scale WILL cause database lock collisions.<br />
<br />
To see hostnames as they're found:<br />
<br />
tail -q -n 0 -f *.hostnames<br />
<br />
No output means you're IP banned.<br />
<br />
=== Range Claim ===<br />
{| border = 1<br />
| '''Range'''<br />
| '''Chunk(s)'''<br />
| '''User'''<br />
| '''Status'''<br />
| '''Uploaded Hostnames'''<br />
|-<br />
| 1 - 999,999<br />
| 1-99<br />
| closure<br />
| Done (742846)<br />
| archived<br />
|-<br />
| 1,000,000 - 1,999,999<br />
| 100-199<br />
| db48x / closure<br />
| Downloading<br />
|<br />
|-<br />
| 2,000,000 - 2,009,999<br />
| 200<br />
| aggroskater<br />
| Done (8907)<br />
| [https://dl.dropbox.com/u/67912136/2000000.hostnames.gz 2000000.hostnames.gz] archived<br />
|-<br />
| 2,010,000 - 2,019,999<br />
| 201<br />
| aggroskater<br />
| Done (8094)<br />
| [https://dl.dropbox.com/u/67912136/2010000.hostnames.gz 2010000.hostnames.gz] archived<br />
|-<br />
| 2,020,000 - 2,999,999<br />
| 202-299<br />
| dcmorton<br />
| Downloading<br />
|-<br />
| 3,000,000 - 3,999,999<br />
| 300-399<br />
| closure<br />
| Done (928023)<br />
| archived<br />
|-<br />
| 4,000,000 - 4,999,999<br />
| 400-499<br />
| chazchaz101<br />
| Downloading<br />
|<br />
|-<br />
| 5,000,000 - 5,999,999<br />
| 500-599<br />
| Smiley / Soult<br />
| incomplete (73874)<br />
| archived<br />
|-<br />
| 6,000,000 - 6,999,999<br />
| 600-699<br />
| dcmorton<br />
| Downloading<br />
|<br />
|-<br />
| 7,000,000 - 7,999,999<br />
| 700-799<br />
| balrog<br />
| Partial (39462)<br />
| archived<br />
|-<br />
| 7,905,000 - 7,909,999<br />
| 790<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,915,000 - 7,919,999<br />
| 791<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,925,000 - 7,929,999<br />
| 792<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,935,000 - 7,939,999<br />
| 793<br />
| yipdw<br />
| Downloading<br />
|<br />
|-<br />
| 8,000,000 - 8,999,999<br />
| 800-899<br />
| beardicus/Soult<br />
| Done (984258)<br />
| [http://helo.nodes.soultcer.com/posterous/8000000-8999999.hostnames.gz 8000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/8000000-8999999.sqlite.gz 8000000.sqlite.gz] archived<br />
|-<br />
| 9,000,000 - 9,999,999<br />
| 900-999<br />
| GLaDOS<br />
| Downloading<br />
|<br />
|-<br />
| 10,000,000 - 10,019,999<br />
| 1000-1001<br />
| <span style="color:#c0c0c0">(Your name here!)</span><br />
| Partial<br />
| [http://posterous.archivingyoursh.it/10000000.hostnames.gz 10000000.hostnames.gz]<br />
|-<br />
| 10,020,000 - 10,069,999<br />
| 1002-1006<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,070,000 - 10,209,999<br />
| 1007-1020<br />
| flaushy<br />
| Downloading<br />
|<br />
|-<br />
| 10,210,000 - 10,309,999<br />
| 1021-1030<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,310,000 - 10,409,999<br />
| 1031-1040<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,410,000 - 10,509,999<br />
| 1041-1050<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,510,000 - 10,609,999<br />
| 1051-1060<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|-<br />
| 10,610,000 - 10,709,999<br />
| 1061-1070<br />
| siliconvalleypark<br />
| Downloading<br />
|<br />
|-<br />
| 10,710,000 - 11,009,999<br />
| 1071-1100<br />
| S[h]O[r]T<br />
| Downloading<br />
|<br />
|}</div>Soulthttps://wiki.archiveteam.org/index.php?title=Posterous&diff=9355Posterous2013-02-17T14:09:44Z<p>Soult: </p>
<hr />
<div>{{Infobox project<br />
| title = Posterous<br />
| image = Posterous_home.png<br />
| description = <br />
| URL = http://posterous.com<br />
| project_status = {{closing}}<br />
| archiving_status = {{inprogress}}<br />
| irc = preposterus<br />
}}<br />
<br />
Posterous is a blogging platform started in May 2008. It was acquired by Twitter on March 12, 2012 and will shut down April 30, 2013. [http://blog.posterous.com/thanks-from-posterous Announcement]<br />
<br />
== Site List Grab ==<br />
<br />
We are currently assembling a list of Posterous sites that need grabbing. Development is seat-of-the-pants-y right now, and the following instructions will get your IP banned fairly quickly. Join us in #preposterus on efnet for state-of-the-art chitchat.<br />
<br />
=== Instructions ===<br />
Download the latest script: [https://github.com/ArchiveTeam/smeg git] <br />
<br />
Claim a number range in the table below<br />
<br />
Run 100 smegs concurrently. The following example will run the 1-2 million range:<br />
<br />
for chunk in $(seq 100 199); do ./smeg $chunk & done<br />
<br />
Running this with the python variant at a high scale WILL cause database lock collisions.<br />
<br />
To see hostnames as they're found:<br />
<br />
tail -q -n 0 -f *.hostnames<br />
<br />
No output means you're IP banned.<br />
<br />
=== Range Claim ===<br />
{| border = 1<br />
| '''Range'''<br />
| '''Chunk(s)'''<br />
| '''User'''<br />
| '''Status'''<br />
| '''Uploaded Hostnames'''<br />
|-<br />
| 1 - 999,999<br />
| 1-99<br />
| closure<br />
| Incomplete (470714)<br />
|<br />
|-<br />
| 1,000,000 - 1,999,999<br />
| 100-199<br />
| db48x<br />
| Downloading<br />
|<br />
|-<br />
| 2,000,000 - 2,009,999<br />
| 200<br />
| aggroskater<br />
| Done (8907)<br />
| [https://dl.dropbox.com/u/67912136/2000000.hostnames.gz 2000000.hostnames.gz] archived<br />
|-<br />
| 2,010,000 - 2,019,999<br />
| 201<br />
| aggroskater<br />
| Done (8094)<br />
| [https://dl.dropbox.com/u/67912136/2010000.hostnames.gz 2010000.hostnames.gz] archived<br />
|-<br />
| 2,020,000 - 2,999,999<br />
| 202-299<br />
| dcmorton<br />
| Downloading<br />
|-<br />
| 3,000,000 - 3,999,999<br />
| 300-399<br />
| closure<br />
| Done (928023)<br />
| archived<br />
|-<br />
| 4,000,000 - 4,999,999<br />
| 400-499<br />
| chazchaz101<br />
| Downloading<br />
|<br />
|-<br />
| 5,000,000 - 5,999,999<br />
| 500-599<br />
| Smiley<br />
| incomplete (73874)<br />
| archived<br />
|-<br />
| 6,000,000 - 6,999,999<br />
| 600-699<br />
| dcmorton<br />
| Downloading<br />
|<br />
|-<br />
| 7,000,000 - 7,999,999<br />
| 700-799<br />
| balrog<br />
| Partial (39462)<br />
| archived<br />
|-<br />
| 7,905,000 - 7,909,999<br />
| 790<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,915,000 - 7,919,999<br />
| 791<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,925,000 - 7,929,999<br />
| 792<br />
| yipdw<br />
| Done<br />
|<br />
|-<br />
| 7,935,000 - 7,939,999<br />
| 793<br />
| yipdw<br />
| Downloading<br />
|<br />
|-<br />
| 8,000,000 - 8,999,999<br />
| 800-899<br />
| beardicus/Soult<br />
| Done (984173)<br />
| [http://helo.nodes.soultcer.com/posterous/8000000-8999999.hostnames.gz 8000000.hostnames.gz], [http://helo.nodes.soultcer.com/posterous/8000000-8999999.sqlite.gz 8000000.sqlite.gz]<br />
|-<br />
| 9,000,000 - 9,999,999<br />
| 900-999<br />
| GLaDOS<br />
| Downloading<br />
|<br />
|-<br />
| 10,000,000 - 10,999,999<br />
| 1000-1099<br />
| <span style="color:#c0c0c0">(Your name here!)</span><br />
| Partial<br />
| [http://posterous.archivingyoursh.it/10000000.hostnames.gz 10000000.hostnames.gz] partially archived<br />
|}</div>Soulthttps://wiki.archiveteam.org/index.php?title=Main_Page&diff=9342Main Page2013-02-16T22:25:39Z<p>Soult: active projects</p>
<hr />
<div>__NOTOC__<br />
<center><br />
<!-- [[Image:Jasonappeal.png|center|link=Introduction]]--><br />
<table style="width:100%;border-spacing:8px;margin:12px 0px 0px 0px"><br />
<tr><td style="width:60%;border:1px solid #FFB9B9;background-color:#FFFFF0;vertical-align:top;color:#000"><br />
<table class="thumb" width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#FFFFF0;"><br />
<tr><td><br />
<td style="color:#000;text-align:left;vertical-align:top"><br />
<br />
=== HISTORY IS OUR FUTURE ===<br />
[[Image:Archiveteam.jpg|right|200px]]<br />
''And we've been trashing our history''<br />
<br />
Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: '''IT DOESN'T HAVE TO BE THIS WAY'''.<br />
<br />
This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.<br />
<!-- featured article ends --><br />
<br />
===Currently Active Projects (Get Involved Here!) ===<br />
<br />
<tr><td style="color:#000" colspan=2><br />
<!-- active starts --><br />
* '''[[Posterous]]''' - [http://www.posterous.com Posterous], a blogging service acquired by [[Twitter]] is shutting down on April 30th, 2013. <br />
<br />
Feel free to join us on the [[IRC_Channel|IRC channel]]! We're on the EFnet network in a channel called '''#archiveteam''', where we say truly awful things. <br />
<br />
<!-- active ends --><br />
<br />
===Archive Team News===<br />
<tr><th colspan=2><br />
</th></tr><br />
<tr><td style="color:#000" colspan=2><br />
<!-- news starts --><br />
* '''January, 2013''': That took long enough! We've turned on new user account creation, with a "are you human" edit checker added.<br />
* '''August, 2012''': It's August Cleanup time! We're shutting off new user accounts while we clean out spam and generally shore up the ol' barge.<br />
* '''May, 2012''': Tabblo.com announces its closure scheduled for May 30th, giving its userbase just ten days of warning. Archive Team is on the case. <br />
* '''May, 2012''': ArchiveTeam's save of [http://web.archive.org/web/20080607211809/http://crave.cnet.co.uk/0,39029477,49296926-10,00.htm Stage6], a defunct video sharing site run by DivX, Inc. is permanently preserved at the [http://archive.org/details/stage6 Internet Archive].<br />
* '''April, 2012''': 20 million Friendster accounts spanning 14 terabytes are successfully rescued for permanent storage by Archive Team. <br />
* '''March, 2012''': [http://www.dereferer.org/?http%3A%2F%2Ffortunecity.com FortuneCity] announced the end of its free-hosting model, threatening around one million user-generated websites.<br />
* '''March, 2012''': We've switched servers to one of those new-fangled hosting companies that aren't hacked. We're going to sell you a lot less in the way of medical supplies now.<br />
* '''December, 2011''': POE News says it will soon be nevermore: [http://www.poe-news.com/forums/sp.php?pi=1002546492 Announcement]<br />
* '''December, 2011''': GamePro magazine halts publication and their website goes dark.<br />
* '''November, 2011''': Archiveteam rescues over 1.3 million users' data from [[Splinder]]'s closure.<br />
* '''July 2011''': Archiveteam teaches you how to [[Rescuing_Floppy_Disks|rescue data from Floppy Disks]].<br />
* '''May, 2011''': [[Friendster]] is deleting everything at the end of the month.<br />
* '''May, 2011''': Archiveteam keeps it classy at [[poetry.com]].<br />
* '''April, 2011''': How about some [[Google Video]]?<br />
* '''March, 2011''': The [http://www.archive.org/details/personalarchiveconf 2011 Personal Digital Archiving Conference] talks are available.<br />
* '''February, 2011''': Let's watch some [[Yahoo! Video]]<br />
* '''December, 2010''': Archiveteam is Delicious!<br />
* '''October, 2010''': Archiveteam offers Geocities as a torrent.<br />
* '''December 23, 2009''': Yahoo shut down [[starwars.yahoo.com]]. We got a copy.<br />
* '''October, 2009''': [[Geocities]] closing is definitely the top of the charts.<br />
<!-- news ends --><br />
</td></tr><br />
</table><br />
<td style="width:40%;border:1px solid #cedff2;background-color:#f5faff;vertical-align:top"><br />
<table width="100%" cellpadding="2" cellspacing="5" style="vertical-align:top;background-color:#f5faff"><br />
<tr><td><br />
[[Image:Archivetime.png]]<br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">What is What</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<!-- links starts --><br />
* [[Who We Are]] and how you can join our cause!<br />
<br />
* [[Deathwatch]] is where we keep track of sites that are sickly, dying or dead.<br />
<br />
* [[Fire Drill]] is where we keep track of sites that seem fine but a lot depends on them.<br />
<br />
* [[Projects]] is a comprehensive list of AT endeavors.<br />
<br />
* [[Philosophy]] describes the ideas underpinning our work.<br />
<br />
* [[Archives]] <br />
<!-- links ends --><br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Some Starting Points</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<br />
* [[Introduction|The Introduction]] is an overview of basic archiving methods.<br />
<br />
* [[Why Back Up?]] Because they don't care about you.<br />
<br />
* [[Facebook|Back Up your Facebook Data]] Learn how to liberate your personal data from Facebook.<br />
<br />
* [[Software]] will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. <br />
<br />
* [[Formats]] will familiarise you with the various data formats, and how to ensure your files will be readable in the future.<br />
<br />
* [[Storage Media]] is about where to get it, what to get, and how to use it.<br />
<br />
* [[Recommended Reading]] links to others sites for further information.<br />
<br />
* [[Frequently Asked Questions]] is where we answer common questions.<br />
</td></tr><br />
<tr><th><br />
<h2 style="margin:0;background:#cedff2;font-size:120%;font-weight:bold;border:1px solid #a3b0bf;text-align:left;color:#000;padding:0.2em 0.4em;">Quote of the Moment</h2><br />
</th></tr><br />
<tr><td style="color:#000"><br />
<tr><td style="margin:20;background-color:#000000;font-size:200%;font-weight:bold;border:1px solid #a3b0bf;text-align:center;color:#fff;" ><br />
"[Yahoo!] found the way to destroy <br />
the most massive amount of history<br />
in the shortest amount of time <br />
with absolutely no recourse"<br />
</td></tr><br />
<tr><td style="text-align:right"><br />
[http://www.time.com/time/business/article/0,8599,1936645,00.html Internet Atrocity! GeoCities' Demise Erases Web History] <br />
<br>By Dan Fletcher, TIME Magazine, Monday, Nov. 09, 2009<br />
</td></tr><br />
<tr><td><br />
<br />
=== Ended Projects ===<br />
* '''[[MobileMe]]''' - Apple's file storage and sharing service, currently hosting over 200 terabytes of data, shut down on June 30, 2012. <br />
** Link to Mobile me search here<br />
* '''[[Tabblo]]''' - A site where users told stories with pictures. Closed May 30, 2012.<br />
** Link to search here<br />
* '''[[FanFiction.Net]]''' - Around 7 million fan-fiction stories hosted on what may be the largest site of its kind in the world. They're not shutting down but Archiveteam has a copy "just in case".<br />
** Link to archive?<br />
* '''[[Geocities]]''' - We archived most of geocities mother fuckers!<br />
** Link to archive....<br />
* '''[[Fortune City]]''' - It maybe gone but we've still got it<br />
** As always the link...<br />
<br />
[[:Category:Rescued Sites| More]]<br />
<td></tr><br />
</table><br />
</td></tr><br />
</table><br />
'''Archive Team is in no way affiliated with the fine folks at [http://www.archive.org ARCHIVE.ORG]'''<br />
'''Archive Team can always be reached at [mailto:archiveteam@archiveteam.org archiveteam@archiveteam.org]'''</div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist&diff=9341MediaWiki:Spam-blacklist2013-02-16T22:25:33Z<p>Soult: Remove posterous</p>
<hr />
<div> # External URLs matching this list will be blocked when added to a page.<br />
# This list affects only this wiki; refer also to the global blacklist.<br />
# For documentation see http://www.mediawiki.org/wiki/Extension:SpamBlacklist<br />
#<!-- leave this line exactly as it is --> <pre><br />
#<br />
# Syntax is as follows:<br />
# * Everything from a "#" character to the end of the line is a comment<br />
# * Every non-blank line is a regex fragment which will only match hosts inside URLs<br />
# * ^.* and .*$ make it so that only domains are matched, not full URLs<br />
<br />
# Spam terms<br />
best-?deal<br />
attsystems<br />
(car|health|life)-?insurance<br />
christian-?louboutin<br />
electronic-?cigarette<br />
hotels?-?(booking|discount)<br />
jersey<br />
jordan<br />
lingerie<br />
loan<br />
lottery<br />
money<br />
outlet<br />
penis-?enlargement<br />
sex-?chat<br />
weight-?(loss|gain)<br />
<br />
# Blogging/community sites which are bad at filtering spam<br />
aolanswers\.com<br />
\.beeplog\.com<br />
\.blog\.ca<br />
blogspace\.fr<br />
diigo\.com<br />
doomby\.com<br />
foodbuzz\.com<br />
gameinformer\.com<br />
insanejournal\.com<br />
jeteye\.com<br />
livelogcity\.com<br />
retrogamer\.net<br />
sitesays\.com<br />
statigr\.am<br />
tagged\.com<br />
\.tumblr\.com # Requires account to report spam blogs. No thank you.<br />
\.xanga\.com<br />
<br />
# Hacked websites where the abuse departement does not care<br />
asu\.edu<br />
ncsu\.edu<br />
scu\.edu # Has no abuse contact at all<br />
<br />
# Spam Domains<br />
123finances\.eu<br />
1remodelingchicago\.com<br />
247digitallearning\.com<br />
3825\.co.uk<br />
5x5workouts\.net<br />
ableton-serato\.com<br />
abubbleshooter\.info<br />
accountingdegree101.com<br />
adidasjeremyscottwings\.com<br />
adral\.eu<br />
air-conditioner-reviews\.info<br />
akerpub\.com<br />
alonerank\.com<br />
ameritrustshield\.com<br />
angelweddingdress\.com<br />
antivirusfirewallsoftwaresite\.org<br />
aremyhair\.com<br />
asiaone\.com<br />
askmehelpdesk\.com<br />
australias\.com\.au<br />
autostoreplus\.com<br />
babygearland\.com<br />
backup-4\.com<br />
beats-bydre\.net<br />
beijingsensualmassage\.com<br />
bestbuylouisvuitton\.com<br />
bestdatinglink\.com<br />
bestonlinebuys\.net<br />
bestseoagency\.net<br />
bestwebsitedesigncompanies\.net<br />
bioactives-morinda\.com<br />
bizeso\.com<br />
bizscribes\.com<br />
bodyjewelleryshop\.com<br />
bubbleshooteronline\.info<br />
buybeatsbydre\.com<br />
buycarhartt\.net<br />
cabbagesoupdiett\.com<br />
calculette-pret-immobilier\.fr<br />
casinoinfo\.pl<br />
cedarmulch\.net<br />
classvogue\.com<br />
clickbank\.net<br />
colorstrokespainters\.com\.au<br />
conservatoryprices0ob\.net<br />
creepers\.ch<br />
czarymary\.pl<br />
debtsadvice\.net<br />
demo-download\.org<br />
dentists-atlanta\.net<br />
doubleglazeddoorslj9j\.org<br />
download-yahoo-messenger\.net<br />
dress-sense\.sg<br />
dressup24h\.com<br />
easypret\.fr<br />
ebutcherblockcountertops\.com<br />
efusiontech\.com<br />
emergencypreparednesshelp\.net<br />
empowerbpo\.com<br />
e-najlepsza-lokata\.com<br />
everylight\.co\.uk<br />
eyelashcurlers\.org<br />
ezdi\.us<br />
fabricacurtain\.com<br />
felicitysglutenfreehandbook\.com<br />
femmefatalehats\.com<br />
finanziellen-freiraum\.de<br />
findingnola\.com<br />
for-htc\.com<br />
free-drug-rehab\.com<br />
furnitureforbathroom\.co\.uk<br />
gadgetinthebox\.com<br />
garminnuviportablegps\.com<br />
gazeta\.pl<br />
geek-lamp\.com<br />
get-free-diapers\.com<br />
getgplusvotes\.com<br />
glutenfreerecipebox\.com<br />
goodreads\.com<br />
googlelocalranking\.com<br />
gpgyjr\.com\.cn<br />
guanacaste\.net<br />
hairagainreviews\.org<br />
hatena\.com<br />
hdrolx\.com<br />
hermesfair\.com<br />
hervelegerdressonline2012\.com<br />
hi5mediagroup\.com<br />
hostingreviewsandcoupons\.com<br />
house-maintain\.blogspot\.com<br />
i-am-adopted\.com<br />
iii\.org\.tw<br />
iinfobase.\com<br />
inboxbuddy\.com<br />
instantcashloanforme\.com<br />
ipadaccessoriesale\.uk\.com<br />
itsakon\.com<br />
iusetbellum\.blogspot\.com<br />
jasa-seo\.org<br />
juegosgratis1\.info<br />
jukeboxalive\.com<br />
jumbobookmarks\.com<br />
keepandshare\.com<br />
kinderbadeshop\.de<br />
kingswaylifecare\.com<br />
kokosowy\.pl<br />
lagbook\.com<br />
lawn-edging\.com<br />
legalsoundz\.com<br />
leonisawesome\.com<br />
letusreckon\.com<br />
lolliboys\.com<br />
louisvuittonbagsroom\.com<br />
louisvuittonhandbagsstore\.co\.uk<br />
macipadvideo\.us\.com<br />
macobserver\.com<br />
marketingmassachusetts\.net<br />
marvelousessays\.com<br />
masscouponsubmitter\.com<br />
matelasbonheur\.ca<br />
mbk-center\.com<br />
mebletarnów\.com.pl<br />
mediscribes\.com<br />
meridianstars\.com<br />
merritts\.uk\.com<br />
michaeljackson-songs\.org<br />
miedziaki\.eu<br />
mmichalekellya\.xanga\.com<br />
moslemunity\.com<br />
mp3in1\.com<br />
mulchinglawn\.com<br />
my-landimmo\.de<br />
n2acards\.com<br />
naturalinfertilitytreatments\.wordpress.com<br />
needrapidcashnow\.com<br />
net-promotion\.pl<br />
newerahats2012\.com<br />
new-rap-songs\.com<br />
newyorkgiantsnikejerseysstore\.com<br />
nibtv\.com<br />
nieruchomościtarnów\.com.pl<br />
onestopbookmarks\.com<br />
onlineprnews\.com<br />
ovidiusilaghi\.ro<br />
pandatarot\.com<br />
patiodoorsda5\.com<br />
pisaniecv\.info<br />
pixelperfectsoftworks\.com<br />
popculturedivas\.com<br />
power-leveling\.us<br />
practutor\.com<br />
prnewswire\.com<br />
product-samples\.net<br />
professays\.com<br />
prosdi\.com<br />
purevolume\.com<br />
qiel\.com<br />
r220\.cc<br />
rajpromotions\.com<br />
rapidfbfans\.com<br />
recoverytoolbox\.com<br />
redditmarketing\.com<br />
repairtoolbox\.com<br />
retiringincostarica\.org<br />
rulettstrategiak\.com<br />
sarasotacriminalattorneys\.com<br />
sbwire\.com<br />
scheidungohneanwalt\.com<br />
schoolgrantsguides\.blogspot.com<br />
scrapebrokers\.com<br />
self-defensesupply\.com<br />
seogooglemaps\.net<br />
seo-methods\.com<br />
seopackagepricing\.com<br />
showingoncam\.com<br />
smartphonewebcreator\.com<br />
sovcal\.com<br />
spittingandvomitting\.com<br />
sports-camping\.com<br />
squidoo\.com<br />
superiorpapers\.com<br />
supremeessays\.com<br />
surfbrands\.net<br />
symbian-kreatif\.co\.cc<br />
szybki-kredyt-bez-bik\.com<br />
tabletpcwarehouse\.net<br />
tani-kredyt-mieszkaniowy\.org<br />
taoholycity\.com<br />
televisionspain\.net<br />
theprivatenetwork\.net<br />
thesecretworldhack\.com<br />
thespainforum\.com<br />
theuniquehoodiasite\.com<br />
ticketforeverything\.com<br />
tjindustrial\.com<br />
tn-?requin-?paschers\.(biz|eu)<br />
trustbanq\.com<br />
urbanspycam\.com<br />
usalouisvuittonshopping\.com<br />
vacationscostarica\.com<br />
wallmountingatv\.com<br />
watchnflgamesonlinehd\.com<br />
webdesignsanluisobispo\.wordpress.com<br />
webousb\.com<br />
wedding-cake-decorations\.net<br />
wedding-cake-stands\.net<br />
wheretogetengaged\.com<br />
wholesaledefenseonline\.com<br />
whyimhotter\.com<br />
whyonlinebackup\.com<br />
wordpressseoexpert\.com<br />
worldselectshop\.com<br />
wylinka\.com<br />
xaby\.com<br />
xg4ken\.com<br />
yaseminler\.com<br />
zay\.pl<br />
ziel-motivation\.com<br />
zlewozmywak24\.pl<br />
<br />
#</pre> <!-- leave this line exactly as it is --></div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=9243URLTeam2013-01-30T21:54:46Z<p>Soult: /* New table */ Update table</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== Who did this? ==<br />
You can join us in our IRC channel: [irc://irc.efnet.org/urlteam #urlteam] on [http://www.efnet.org/ EFNet]<br />
* [[User:Scumola]] started this wiki page<br />
* [[User:Chronomex]] started the Urlteam scraping effort<br />
* [[User:Soult]] Helps with scraping<br />
* [[User:Jeroenz0r]] Helps with scraping (and stalking Soult)<br />
* ... many ArchiveTeam people who run the scrapers<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/soult/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/soult/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com/ Tinyurl.com]<br />
| 1,000,000,000<br />
| [[Warrior]]<br />
| scraping: sequential, <= 6 characters<br />
| new shorturls: non-sequential, 7 characters<br />
|-<br />
| [http://bit.ly/ Bit.ly]<br />
| 4,000,000,000<br />
| [[Warrior]]<br />
| scraping: non-sequential, 6 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://goo.gl Goo.gl]<br />
| ?<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 810,264,745 (2013-01-30)<br />
| [[Warrior]]<br />
| scraping: sequential, <= 5 characters<br />
| new shorturls: non-sequential, 6 characters<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
|<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| [http://tr.im tr.im]<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| adjix.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Already done: 00-zz, 000-zzz, 0000-izzz.<br />
| case-insensitive, incremental<br />
|-<br />
| rod.gs<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 00-ZZ, 000-2Qc<br />
| case-sensitive, incremental, server can't keep up with all the requests.<br />
|-<br />
| biglnk.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 0-Z, 00-ZZ, 000-ZZZ<br />
| case-sensitive, incremental<br />
|-<br />
| go.to<br />
| 60000<br />
| [[User:Asiekierka]]<br />
| Done: ~45000 (go.to network links only: [http://64pixels.org/goto_dump.zip goto_dump.zip])<br />
| no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Alive ===<br />
<br />
Last verified 2012-12-29. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>.<br />
<br />
* awe.sm<br />
* budurl.com - Appears non-incremental<br />
* cli.gs - Appears non-incremental<br />
* decenturl.com - Not at all easy to scrape.<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* ilix.in - HTML redirect<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitthis.com<br />
* u.mavrev.com - Not accepting new urls.<br />
* ur1.ca - Database is downloadable from website directly.<br />
* urlcut.com<br />
* vimeo.com<br />
* vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string.<br />
* xrl.us - see metamark.net<br />
* x.se - Cannot resolve, but www.x.se works.<br />
* yatuc.com - Not accepting new urls.<br />
* yep.it<br />
<br />
==== "Official" shorteners ====<br />
* goo.gl - Google<br />
* fb.me - Facebook<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
* t.co? - Twitter<br />
* post.ly - Posterous<br />
* wp.me - Wordpress.com<br />
* flic.kr - Flickr<br />
* lnkd.in - LinkedIn<br />
* su.pr - StumbleUpon<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
<br />
===== bit.ly aliases =====<br />
<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* 1.usa.gov - USA Government<br />
* tcrn.ch - Techcrunch<br />
<br />
=== Dead or Broken ===<br />
<br />
* 1link.in - Website dead<br />
* 6url.com - HTML redirect, Error 500<br />
* ad.vu - mirror of adjix.com, application not found<br />
* canurl.com - Website dead<br />
* chod.sk - Appears non-incremental, not resolving<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* gonext.org - not resolving<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. <br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kurl.us - Parked.<br />
* lnkurl.com - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.<br />
* shortlinks.co.uk - Working again. Maybe not.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.<br />
* traceurl.com - DNS fails to resolve.<br />
* tr.im - "Be back soon!"<br />
* twitpwr.com - Domain parked.<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."<br />
* urlborg.com - 404 Not Found.<br />
* urlcover.com - Domain parked.<br />
* urlhawk.com - Domain parked.<br />
* url-press.com - Suspended by web host.<br />
* urlsmash.com - DNS not resolving.<br />
* urltea.com - Dreamhost's coming soon page.<br />
* urlvi.be - Domain parked.<br />
* urlx.org - Owner has agreed to share his database<br />
* w3t.org - 403 Forbidden.<br />
* wlink.us - Domain parked.<br />
* xaddr.com - Domain parked.<br />
* xil.in - Under construction.<br />
* xym.kr - Gibberish (?) Korean text blog.<br />
* yweb.com - Suspicious iframe with long url and fake loading gif image.<br />
* zi.ma - DNS not resolving.<br />
<br />
==== Discontinued ====<br />
<br />
* urlbrief.com - co-operates with 301Works.org<br />
<br />
=== Hueg list ===<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=9040URLTeam2012-12-05T22:20:42Z<p>Soult: Use optipng-compressed logo</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam-logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== Who did this? ==<br />
You can join us in our IRC channel: [irc://irc.efnet.org/urlteam #urlteam] on [http://www.efnet.org/ EFNet]<br />
* [[User:Scumola]] started this wiki page<br />
* [[User:Chronomex]] started the Urlteam scraping effort<br />
* [[User:Soult]] Helps with scraping<br />
* [[User:Jeroenz0r]] Helps with scraping (and stalking Soult)<br />
* ... many ArchiveTeam people who run the scrapers<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/soult/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/soult/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com TinyURL]<br />
| 1000000000<br />
| [[User:Soult]]<br />
| 5-letter codes done, on halt due to being banned (2010-12-20)<br />
| non-sequential, bans IP for requesting too many non-existing shorturls<br />
|-<br />
| [http://bit.ly bit.ly]<br />
| 4000000000<br />
| [[User:Soult]]<br />
| lots and lots of scraping needed (2011-03-25)<br />
| non-sequential<br />
|-<br />
| [http://goo.gl goo.gl]<br />
| ??<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 534183259<br />
| [[User:Chronomex]]/[[User:Soult]]<br />
| probably got about 95% before switch to non-sequential<br />
| now non-sequential, new software version added crappy rate limiting<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done (2009-08-14)<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| [http://tr.im tr.im]<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| adjix.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Already done: 00-zz, 000-zzz, 0000-izzz.<br />
| case-insensitive, incremental<br />
|-<br />
| rod.gs<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 00-ZZ, 000-2Qc<br />
| case-sensitive, incremental, server can't keep up with all the requests.<br />
|-<br />
| biglnk.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 0-Z, 00-ZZ, 000-ZZZ<br />
| case-sensitive, incremental<br />
|-<br />
| go.to<br />
| 60000<br />
| [[User:Asiekierka]]<br />
| Done: ~45000 (go.to network links only: [http://64pixels.org/goto_dump.zip goto_dump.zip])<br />
| no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Old list<ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref> ===<br />
List last updated 2009-08-14.<br />
* 6url.com - HTML redirect<br />
* ad.vu - mirror of adjix.com<br />
* awe.sm<br />
* budurl.com - Appears non-incremental<br />
* cli.gs - Appears non-incremental<br />
* decenturl.com - Not at all easy to scrape.<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* ilix.in - HTML redirect<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts.<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ow.ly - I can't get it to work.<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab<br />
* shortlinks.co.uk - Working again.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* snipurl.com - See above ^<br />
* snurl.com - See above above ^^<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* traceurl.com<br />
* tr.im<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitpwr.com<br />
* twitthis.com<br />
* twurl.nl<br />
* u.mavrev.com<br />
* ur1.ca - Database is downloadable from website directly.<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant.<br />
* urlborg.com<br />
* urlbrief.com<br />
* urlcover.com<br />
* urlcut.com<br />
* urlhawk.com<br />
* url-press.com<br />
* urlsmash.com<br />
* urltea.com<br />
* urlvi.be<br />
* urlx.org - Owner has agreed to share his database<br />
* vimeo.com<br />
* wlink.us<br />
* xaddr.com<br />
* xil.in<br />
* xrl.us - see metamark.net<br />
* xym.kr<br />
* x.se<br />
* yatuc.com<br />
* yep.it<br />
* yweb.com<br />
* zi.ma<br />
* w3t.org<br />
<br />
==== "Official" shorteners ====<br />
* goo.gl - Google<br />
* fb.me - Facebook<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
* t.co? - Twitter<br />
* post.ly - Posterous<br />
* wp.me - Wordpress.com<br />
* flic.kr - Flickr<br />
* lnkd.in - LinkedIn<br />
* su.pr - StumbleUpon<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
<br />
===== bit.ly aliases =====<br />
<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* 1.usa.gov - USA Government<br />
* tcrn.ch - Techcrunch<br />
<br />
==== Dead or Broken Shorteners ====<br />
* chod.sk - Appears non-incremental, not resolving<br />
* gonext.org - not resolving<br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kurl.us - Parked.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* 1link.in - Website dead<br />
* canurl.com - Website dead<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* lnkurl.com - Website dead<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
<br />
==== Hueg list ====<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=TinyBack&diff=8988TinyBack2012-10-20T17:42:42Z<p>Soult: Redirected page to URLTeam#TinyBack</p>
<hr />
<div>#REDIRECT [[URLTeam#TinyBack]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=URLTeam&diff=8987URLTeam2012-10-20T17:41:20Z<p>Soult: Update all but the shortener list</p>
<hr />
<div>{{Infobox project<br />
| title = Urlteam<br />
| image = Urlteam logo.png<br />
| description = url shortening was a fucking awful idea<br />
| URL = http://urlte.am<br />
| project_status = {{online}}<br />
| archiving_status = {{in progress}}<br />
| source = https://github.com/ArchiveTeam/urlteam-stuff<br />
| tracker = http://tracker.tinyarchive.org/<br />
| irc = urlteam<br />
}}<br />
<br />
'''TinyURL''', '''bit.ly''' and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.<br />
<br />
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see [http://en.wikipedia.org/wiki/Link_rot Wikipedia: Link Rot]). [http://www.archive.org/details/301works Archive.org]/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member ''bit.ly'' does not actually share their databases and most other big shorteners don't share theirs either.<br />
<br />
== Who did this? ==<br />
You can join us in our IRC channel: [irc://irc.efnet.org/urlteam #urlteam] on [http://www.efnet.org/ EFNet]<br />
* [[User:Scumola]] started this wiki page<br />
* [[User:Chronomex]] started the Urlteam scraping effort<br />
* [[User:Soult]] Helps with scraping<br />
* [[User:Jeroenz0r]] Helps with scraping (and stalking Soult)<br />
* ... many ArchiveTeam people who run the scrapers<br />
<br />
== 301Work cooperation ==<br />
[[Image:301works logo.jpg|thumb]]<br />
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: [http://www.archive.org/details/301utm http://www.archive.org/details/301utm]. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).<br />
<br />
== Tools ==<br />
* [https://github.com/chronomex/urlteam fetcher.pl]: Perl-based scraper by [[User:Chronomex]]<br />
* [https://github.com/soult/tinyback TinyBack]: Python 2.x-based, distributed scraper (also works with the [[Warrior]])<br />
<br />
=== TinyBack ===<br />
The easiest way to help with scraping is to run the Warrior and select the ''URLTeam'' project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:<br />
<br />
git clone https://github.com/soult/tinyback<br />
cd tinyback<br />
# Use ./run.py --help for more information on command-line options<br />
./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180<br />
<br />
== URL shorteners ==<br />
=== New table ===<br />
The new table includes shorteners we have already started to scrape.<br />
{| class="sortable wikitable" style="width: auto; text-align: center"<br />
! Name<br />
! Est. number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|-<br />
| [http://tinyurl.com TinyURL]<br />
| 1000000000<br />
| [[User:Soult]]<br />
| 5-letter codes done, on halt due to being banned (2010-12-20)<br />
| non-sequential, bans IP for requesting too many non-existing shorturls<br />
|-<br />
| [http://bit.ly bit.ly]<br />
| 4000000000<br />
| [[User:Soult]]<br />
| lots and lots of scraping needed (2011-03-25)<br />
| non-sequential<br />
|-<br />
| [http://goo.gl goo.gl]<br />
| ??<br />
| [[User:Scumola]]<br />
| started (2011-03-04)<br />
| goo.gl throttles pulls<br />
|-<br />
| [http://is.gd is.gd]<br />
| 534183259<br />
| [[User:Chronomex]]/[[User:Soult]]<br />
| probably got about 95% before switch to non-sequential<br />
| now non-sequential, new software version added crappy rate limiting<br />
|-<br />
| [http://ff.im ff.im]<br />
| ?<br />
| [[User:Chronomex]]<br />
|<br />
| only used by FriendFeed, no interface to shorten new URLs<br />
|-<br />
| [http://4url.cc/ 4url.cc]<br />
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done (2009-08-14)<br />
| dead (2011-02-15)<br />
|-<br />
| litturl.com<br />
| 17096<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| xs.md<br />
| 3084 (2009-08-15)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| url.0daymeme.com<br />
| 14867 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref><br />
| [[User:Chronomex]]<br />
| done<br />
| dead (2010-11-18)<br />
|-<br />
| [http://tr.im tr.im]<br />
| 1990425<br />
| [[User:Soult]]<br />
| got what we could<br />
| dead (2011-12-31)<br />
|-<br />
| adjix.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Already done: 00-zz, 000-zzz, 0000-izzz.<br />
| case-insensitive, incremental<br />
|-<br />
| rod.gs<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 00-ZZ, 000-2Qc<br />
| case-sensitive, incremental, server can't keep up with all the requests.<br />
|-<br />
| biglnk.com<br />
| ?<br />
| [[User:Jeroenz0r]]<br />
| Done: 0-Z, 00-ZZ, 000-ZZZ<br />
| case-sensitive, incremental<br />
|-<br />
| go.to<br />
| 60000<br />
| [[User:Asiekierka]]<br />
| Done: ~45000 (go.to network links only: [http://64pixels.org/goto_dump.zip goto_dump.zip])<br />
| no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less<br />
|- class="sortbottom"<br />
! Name<br />
! Number of shorturls<br />
! Scraping done by<br />
! Status<br />
! Comments<br />
|}<br />
<br />
=== Old list<ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref> ===<br />
List last updated 2009-08-14.<br />
* 6url.com - HTML redirect<br />
* ad.vu - mirror of adjix.com<br />
* awe.sm<br />
* budurl.com - Appears non-incremental<br />
* cli.gs - Appears non-incremental<br />
* decenturl.com - Not at all easy to scrape.<br />
* dlvr.it<br />
* doiop.com - Appears non-incremental<br />
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f<br />
* ilix.in - HTML redirect<br />
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts.<br />
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388<br />
* metamark.net / xrl.us - ? http://xrl.us/bfabog<br />
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh<br />
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/<br />
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.<br />
* ow.ly - I can't get it to work.<br />
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc<br />
* qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf<br />
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok<br />
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab<br />
* shortlinks.co.uk - Working again.<br />
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp<br />
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok<br />
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp<br />
* shrinkurl.us - Alway telling URL is malformed<br />
* shrt.st - Appears incremental: http://shrt.st/vpz<br />
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes<br />
* shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu<br />
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.<br />
* snipr.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt<br />
* snipurl.com - See above ^<br />
* snurl.com - See above above ^^<br />
* surl.co.uk - Many shortening options.<br />
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv<br />
* tiny.cc - Appears non-incremental<br />
* traceurl.com<br />
* tr.im<br />
* tweetburner.com / twurl.nl - Appears incremental<br />
* twitpwr.com<br />
* twitthis.com<br />
* twurl.nl<br />
* u.mavrev.com<br />
* ur1.ca - Database is downloadable from website directly.<br />
* url9.com - Sequential, alphanumeric. Leading 0s are significant.<br />
* urlborg.com<br />
* urlbrief.com<br />
* urlcover.com<br />
* urlcut.com<br />
* urlhawk.com<br />
* url-press.com<br />
* urlsmash.com<br />
* urltea.com<br />
* urlvi.be<br />
* urlx.org - Owner has agreed to share his database<br />
* vimeo.com<br />
* wlink.us<br />
* xaddr.com<br />
* xil.in<br />
* xrl.us - see metamark.net<br />
* xym.kr<br />
* x.se<br />
* yatuc.com<br />
* yep.it<br />
* yweb.com<br />
* zi.ma<br />
* w3t.org<br />
<br />
==== "Official" shorteners ====<br />
* goo.gl - Google<br />
* fb.me - Facebook<br />
* y.ahoo.it - Yahoo<br />
* youtu.be - YouTube<br />
* t.co? - Twitter<br />
* post.ly - Posterous<br />
* wp.me - Wordpress.com<br />
* flic.kr - Flickr<br />
* lnkd.in - LinkedIn<br />
* su.pr - StumbleUpon<br />
* go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)<br />
<br />
===== bit.ly aliases =====<br />
<br />
* amzn.to - Amazon <br />
* binged.it - Bing (bonus points for being longer than bing.com)<br />
* 1.usa.gov - USA Government<br />
* tcrn.ch - Techcrunch<br />
<br />
==== Dead or Broken Shorteners ====<br />
* chod.sk - Appears non-incremental, not resolving<br />
* gonext.org - not resolving<br />
* ix.it - Not resolving<br />
* jijr.com - Doesn't appear to be a shortener, now parked<br />
* kissa.be - "Kissa.be url shortener service is shutdown"<br />
* kurl.us - Parked.<br />
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."<br />
* minurl.org - Presently in ERROR 404<br />
* muhlink.com - Not resolving<br />
* myurl.us - cpanel frontend<br />
* 1link.in - Website dead<br />
* canurl.com - Website dead<br />
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041<br />
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3<br />
* go2cut.com - Website dead<br />
* lnkurl.com - Website dead<br />
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead<br />
* memurl.com - Pronounceable. Broken.<br />
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters<br />
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service]<br />
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)<br />
<br />
==== Hueg list ====<br />
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices]<br />
<br />
== References ==<br />
<references /><br />
<br />
== Weblinks ==<br />
* [http://urlte.am urlte.am]<br />
* [http://301works.org 301works.org]<br />
<br />
[[Category: URL Shortening]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Warrior&diff=8986Warrior2012-10-20T17:40:54Z<p>Soult: Redirected page to ArchiveTeam Warrior</p>
<hr />
<div>#REDIRECT [[ArchiveTeam Warrior]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Webshots&diff=8956Webshots2012-10-09T15:08:57Z<p>Soult: Infobox update</p>
<hr />
<div>{{Infobox project<br />
| title = Webshots<br />
| logo = Webshots-logo.png<br />
| image = Webshots-penguin-screenshot.png<br />
| URL = {{url|1=http://www.webshots.com/}}<br />
| project_status = {{closing}}<br />
| source = https://github.com/ArchiveTeam/webshots-grab<br />
| tracker = http://tracker.archiveteam.org/webshots<br />
| archiving_status = {{inprogress}}<br />
| irc = webshots<br />
}}<br />
<br />
"Big News! Webshots is now Smile by Webshots"<br />
<br />
= Warrior Archiving =<br />
<br />
Follow the instructions on [[ArchiveTeam Warrior]] to join in the fun using the warrior tool.<br />
<br />
= Manual Archiving =<br />
<br />
For manual archiving there is a script designed for Debian <br />
6 and higher, but it should work on most distributions that use apt <br />
(such as Ubuntu), simply run the following as root:<br />
<br />
<pre><br />
wget http://cryto.net/projects/webshots/webshots_debian.sh<br />
chmod +x webshots_debian.sh<br />
./webshots_debian.sh</pre><br />
<br />
in a terminal and enjoy (you may want to create a directory for this script first).<br />
<br />
= Discussion =<br />
<br />
Want to talk about Warrior, or maybe the manual script isn't quite working right? Why not come check us out on [[IRC]]<br />
<br />
= Other Infomation =<br />
<br />
== Personal archiving ==<br />
<br />
Are you a Webshots user? You can request a zip file of all your photos at http://community.webshots.com/features/downloads<br />
<br />
== Archive effort ==<br />
<br />
Joepie91 is making a list of users with this [http://git.cryto.net/repo/projects/joepie91/webshots/ script]<br />
<br />
== Site structure ==<br />
<br />
There are separate sections for Professional Photos and Member Photos.<br />
<br />
=== Member Photos ===<br />
<br />
Pictures are displayed in Flash, with http://p.webshots.com/flash/simpleImageLoader.swf or http://p.webshots.com/flash/fullsizeimageloader/FullSizeImageLoader_v1.swf<br />
<br />
Picture pages have urls of the form <nowiki>http://<channel>.webshots.com/photo/<photo id></nowiki>, with channel one of entertainment, family, home-and-garden, news, outdoors, pets, rides, sports, travel etc. ([http://community.webshots.com/ full list])<br />
<br />
On a picture page, e.g.,<br/><br />
http://travel.webshots.com/photo/2248078140105543869vCJpvs,<br/><br />
the image url is stored in a params tag:<br/><br />
<code><nowiki>&lt;param name="flashvars" value="src=http://image85.webshots.com/175/0/78/14/2248078140105543869vCJpvs_ph.jpg"&gt;</nowiki></code><br/><br />
but there is also a normal img tag:<br/><br />
<code><nowiki>&lt;img width="584" height="389" alt="" title="" src="http://image85.webshots.com/175/0/78/14/2248078140105543869vCJpvs_ph.jpg"/&gt;</nowiki></code><br />
<br />
The fullsize link from the image page leads to<br/><br />
http://community.webshots.com/photo/fullsize/2248078140105543869vCJpvs (always community),<br/><br />
with another Flash object. This time the image path is in JavaScript, not in a params tag, and there is no fallback img tag.<br />
<br />
The url of a fullsize image can be derived from the smaller url: replace _ph with _fs, e.g., <br/><br />
http://image85.webshots.com/175/0/78/14/2248078140105543869vCJpvs_ph.jpg<br/> has a fullsize version at<br/> http://image85.webshots.com/175/0/78/14/2248078140105543869vCJpvs_fs.jpg<br />
<br />
Photo pages show an "other sizes" link:<br />
http://community.webshots.com/inlinePhoto?photoId=2411911460010333512&src=c&referPage=http%3A%2F%2Fgood-times.webshots.com%2Fphoto%2F2411911460010333512RyETli</div>Soulthttps://wiki.archiveteam.org/index.php?title=Facebook&diff=8821Facebook2012-08-24T12:07:23Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Dnova</p>
<hr />
<div>{{Infobox project<br />
| title = Facebook<br />
| image = Facebooklogo.png<br />
| description = Facebook Logo<br />
| URL = http://facebook.com<br />
| project_status = {{online}}<br />
| archiving_status = {{nosavedyet}}<br />
}}<br />
<br />
'''Facebook''' is a social networking site whose popularity has exploded in recent years. As of February 2012, there are more than ''845 million'' active users of the site. Facebook hosts untold billions of users' photos, videos, thoughts, conversations, and other content. <br />
<br />
The judicious user will have a well-designed backup plan for all that content that they retain full control over, but it is a reasonable assumption that the majority of users rely totally on Facebook to safeguard their data. '''This is a mistake.''' <br />
<br />
It might seem completely unthinkable that a site as massive and as popular as Facebook could ever disappear, taking your data with it. The reality is that websites, even hugely popular ones, can decline in popularity over time and eventually go away, taking your data with them with little or no warning. We've seen it happen.<br />
<br />
While Facebook may not be in any immediate danger, you should consider that the data you put on Facebook may be immensely important to you in 10 or 20 years, similar to your family's photo albums. Facebook could be long dead by then. Start planning for this eventuality right now. <br />
<br />
== Download Your Data From Facebook ==<br />
<br />
Facebook has created a tool to download an entire archive of your Facebook account. This includes all of your own photos and videos, chat conversations, messages, status updates and wall posts. It does NOT include photos and videos belonging to other people even if you are tagged in them, so do keep that in mind. <br />
<br />
To create your archive, click the little down arrow next to your name in the upper right area of the page and go to "account settings". You should then see a screen like the one below: <br />
[[File:Fbdownload.png | center]]<br />
<br />
The next screen will explain what's going on. Press "Start my Archive" and you will be presented with a popup telling you that this will take some time - around one hour is not unheard of. Press Start again and Facebook will generate the file for you. This may indeed take several minutes. In the mean time you can continue using Facebook as usual. They will email you when the archive is ready for download. <br />
<br />
Your email will contain a link to download your archive. Follow that link and enter your Facebook password to continue. The next page presents you with a download button and an estimate of the archive's size. Download that somewhere convenient for you. '''This file contains highly personal and potentially sensitive information''' so keep it safe! You may wish to encrypt it with a password with a free tool like [http://www.axantum.com/axcrypt/ Axcrypt]. The easiest way to browse the information is to extract the contents of the zip file, and then open the index.html file with your browser of choice. From there you can look at your profile, your wall posts, photos and videos, and private messages. <br />
<br />
Note that as of April, 2012, this download tool seems to have some bugs -- in my tests it failed to completely back up all of my conversations, for example. It's better than nothing but for now at least I don't trust that it is perfect. <br />
<br />
=== Former unofficial Backup tools ===<br />
<br />
* [http://on10.net/Blogs/larry/export-facebook-to-excel-with-friendcsv/ FriendCSV] exports your contacts to CSV files.<br />
<br />
* [http://www.vincentcheung.ca/facedown/ Facedown] downloads photo albums from Facebook. <br />
<br />
This leaves wall, profile information and the plethora of Facebook apps out in the cold. Perhaps a backup app could be written for Facebook from within Facebook using the applications framework.<br />
<br />
== Vital Signs ==<br />
<br />
Currently stable.<br />
<br />
== External links ==<br />
* http://facebook.com<br />
<br />
{{Navigation box}}</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Ross&diff=8820User:Ross2012-08-24T12:07:22Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Quickview</p>
<hr />
<div>[[Image:RossRadio.jpg|400px|center|GREETINGS MY CHILDREN]]<br />
<br />
Editing DIY techniques, general wiki pages and producing propaganda. Ahem. Informative essays and media.<br />
<br />
* [http://data-archaea.blogspot.com/ Data Archaea]<br />
* [http://machinebook.org/ Machinebook]<br />
* [http://www.myspace.com/truewomanhood True Womanhood]<br />
* [http://betrayedzine.blogspot.com Betrayed!]<br />
<br />
<br />
Could you please delete my old Geocities pages?<br />
Thank you --[[User:Quickview|Quickview]] 16:33, 1 August 2010 (UTC)<br />
<br />
<br />
Could you please delete my old Geocities pages?<br />
Thank you --[[User:Quickview|Quickview]] 16:26, 23 August 2010 (UTC)</div>Soulthttps://wiki.archiveteam.org/index.php?title=Livejournal&diff=8819Livejournal2012-08-24T12:07:22Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Micahtredding</p>
<hr />
<div>{{Infobox project<br />
| title = Livejournal<br />
| image = <br />
| description = <br />
| URL = http://livejournal.com<br />
| project_status = {{unknown}}<br />
| archiving_status = {{unknown}}<br />
}}<br />
<br />
'''LiveJournal''' is a blog community started by Brad Fitzpatrick back in 1999. It's changed hands a few times since then and the (huge) userbase has been pretty upset about how the new owners in Russia, SUP, are running the show. All the previous owners have had a potted history of banning people for fairly innocuous things.<br />
<br />
== Backup Tools ==<br />
<br />
* [http://www.livejournal.com/export.bml LiveJournal's own export journal] page can do a month at a time.<br />
<br />
* [http://antennapedia.livejournal.com/266462.html Antennapedia] ''(Mac OS X out-of-the-box support, needs Python where missing)'' - For migrating journal entries from any LJ-style server to any other LJ-style server.<br />
<br />
* [http://fawx.com/software/ljarchive/ ljArchive] ''(Windows only)'' - A nice interface grabs the info from the servers and presents it in its own customizable templates within the program. Exports to HTML and XML. It's very easy to use and is currently being developed on Sourceforge.<br />
<br />
* [http://www.offtopia.net/ljsm/index_en.html ljsm]<br />
<br />
* [http://heinous.org/wiki/LiveJournal_XML_Export_Script Livejournal Export Script] - Pull Livejournal into a database (GDBM), allowing export into HTML or XML, and further import into [http://www.wordpress.org Wordpress] or other blog software.<br />
<br />
* [http://www.ljbook.com/frontpage.php LJbook] ''(Currently overloaded)'' - Web interface exports LJ to a PDF suitable for printing on [http://www.lulu.com Lulu] or just backing up, with images and other options. Limited use per month for unpaid users.<br />
<br />
* [http://hewgill.com/ljdump/ ljdump] ''(Python)'' slurps everything down into a pile of XML files.<br />
<br />
* [http://wordpress.com/ Wordpress.com] can import entire LiveJournals, including comments. Not sure if it's also available in the standalone Wordpress software, or only the hosted service.<br />
<br />
* [http://sourceforge.net/projects/ljkit XJournal] ''(Mac OS X only)'' can download all entries.<br />
<br />
* [http://www.kelpheavyweaponry.com/trac/ljmigrate LJMirgate] ''(Python)'' can archive the entire journal, and optionally migrate to another LJ-based site like InsaneJournal or Dreamwidth.<br />
<br />
* [http://hewgill.com/ljdump/ ljdump] ''(Python)'' dumps to HTML, and can output the format expected by the Wordpress LJ import plugin.<br />
<br />
== Vital Signs ==<br />
* Many core pages (blog posts, etc) are returning 505 errors.<br />
<br />
* [http://news.livejournal.com/127507.html Notifications, Purged Accounts, Stats, TxtLJ, Gulf_Aid_Now (Jul. 14, 2010 news:update] ''"One of the benefits of the work we've done to purge suspended accounts is that we will now be able to purge inactive journals and communities too--something you've been requesting for years! <strike>A journal is defined as inactive if it has not been logged into for 24 consecutive months. A community is defined as inactive if has not been updated for 24 consecutive months.</strike> A journal is defined as inactive if it has not been logged into for 24 consecutive months and has only one post (i.e., the welcome post). A community is defined as inactive if has not been updated for 24 consecutive months and has only one entry and no comments. Once an account is eligible to be purged for inactivity, the owner will be sent an email to alert them of the inactive status. The owner will then have two weeks to log into the journal or post to their community to prevent it from being deleted. If the owner does not log in or post, the account will be delete..."''<br />
<br />
* [http://news.livejournal.com/112503.html Changes at LJ HQ (Jan. 8, 2009 news:update)] ''"The restructuring is done with an eye to the future to ensure the long-term viability of LiveJournal as a business. As a team, we know that LJ has a great future as it prepares for its second decade." '''-''' "We recently invested a considerable amount on all-new server equipment and a facility in Montana to house it all as part of our commitment to the longevity of LJ." '''-''' "We will be around for years to come and we're committed to ensuring that your journals, friends pages, and communities will be, too."''<br />
<br />
* [http://www.alleyinsider.com/2009/1/livejournal-implodes-20-of-27-staff-let-go Livejournal Implodes: Staff Let Go]<br />
<br />
* [http://community.livejournal.com/no_lj_ads/83519.html LJ in 2009 -- The Grim Purge]<br />
<br />
* [http://valleywag.gawker.com/5124184/the-russian-bear-slashes-a-social-network The Russian Bear Slashes a Social Network]<br />
<br />
== External links ==<br />
<br />
* http://livejournal.com - still running as of Janurary 2011<br />
<br />
{{Navigation box}}</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Morbus_Iff&diff=8818User:Morbus Iff2012-08-24T12:07:21Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Morbus Iff</p>
<hr />
<div>http://www.disobey.com/<br />
<br />mailto:morbus@disobey.com</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Name&diff=8817User:Name2012-08-24T12:07:21Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Name</p>
<hr />
<div>== What's in a :Name (?) ==<br />
<br />
(''[http://whohou.livejournal.com/ whohou]'' and ''[http://xkcd.com/327/ Little Bobby DropTables]'' are nice names, too.)</div>Soulthttps://wiki.archiveteam.org/index.php?title=Introduction&diff=8816Introduction2012-08-24T12:07:20Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Gruverja</p>
<hr />
<div>You're here because you have data that you want to save. If this is data you helped create, you are an '''end-user'''. If you have data that you want to save that you did ''not'' create, then you are an '''archivist'''. Archivists should go to the [[Software]] page. Otherwise, read on for information, tips and links.<br />
<br />
[[Image:Backupyourdata.gif|center]]<br />
<br />
== The Most Important Data You Have To Back Up ==<br />
<br />
* The '''most important''' data to back up and the '''least likely to be backed up''' is non-web-browsable information you created yourself, information which nobody else has access to. This is '''your primary''' and '''most precious data''': ''documents'' you've written, ''photos'' you've taken digitally, ''business and personal files'' that exist in '''only a single location on a single computer'''. Currently, this data is located on a metal platter spinning thousands of times a minute, days or weeks at a time, dependent on a wide variety of factors to not be spontaneously lost. You should rectify this situation ''immediately''.<br />
<br />
* Once you realize this, however, you are likely going to freeze up because ''this is some scary crap to hear about''. We don't blame you, but there's no need to panic; if the data has been fine up to this point, spending an hour or two to come up with a good backup strategy is time well spent. Play this simple mind game to visualize a priority path: what files, if you lost them, would represent '''the most pain to get back?''' For most people, it'll be '''financial data''' (spreadsheets, receipts, Quicken files) and '''photos'''. After that, it's likely going to be '''writings''' (essays, school reports, resume). And after ''that'', it's going to likely be '''media''' (movies, music, porn (yes, that is important)).<br />
<br />
* Unless you're doing so professionally or as a major hobby, the non-media files in this list will likely not end up being too large. If your computer is a modern machine (less than ten years old), you will have a USB port. Go out to the store (drug stores and supermarkets count) and find a '''USB flash drive'''. Try to have it be multiple gigabytes. It will likely cost you less than $20. Bring it back, plug it into a USB port, copy over your financials, writings, and photos onto it.<br />
<br />
* Once these files are copied over, unplug the USB key and store it away from the room the computer is in. '''You are by no means done, but you've now decreased your potential for pain by an incredible degree.'''<br />
<br />
== See Also ==<br />
* [[Backup Tips]]<br />
* [[Warning Signs]]<br />
<br />
== External links ==<br />
<br />
* [http://www.drivesavers.com/company-info/recovery-tips/ Drivesavers Recovery Tips]<br />
<br />
[[Category:Archive Team]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=File:Backupyourdata.gif&diff=8815File:Backupyourdata.gif2012-08-24T12:07:20Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Emijrp</p>
<hr />
<div>Really annoying heading text<br />
[[Category:Files]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:LesOrchard&diff=8814User:LesOrchard2012-08-24T12:07:19Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by LesOrchard</p>
<hr />
<div>Les Orchard, aka l.m.orchard, aka Leslie Michael Orchard, aka lmorchard.<br />
<br />
Author (ISBN 0470452021, ISBN 047038459X, ISBN 0470037857, ISBN 0764597582), blogger ([http://decafbad.com 0xDECAFBAD]), and {web,mad,computer} scientist working for the Mozilla Corporation and living near Detroit, MI, USA.<br />
<br />
Yes, I did copy and paste this from my wikipedia page.</div>Soulthttps://wiki.archiveteam.org/index.php?title=Tools&diff=8813Tools2012-08-24T12:07:19Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Emijrp</p>
<hr />
<div>#redirect [[Software]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Wget&diff=8812Wget2012-08-24T12:07:18Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Bbot</p>
<hr />
<div>[http://www.gnu.org/software/wget/ GNU Wget] is a free utility for non-interactive download of files from the Web. Using Wget, it is possible to grab a large chunk of data, or mirror an entire website with it's complete directory tree using a single command. In the tool belt of the renegade archivist, Wget tends to get an awful lot of use. (Note: Some people prefer to use [http://curl.haxx.se/ cURL]. If it can back up data, it's useful).<br />
<br />
This guide will not attempt to explain all possible uses of Wget; rather, this is intended to be a concise intro to using Wget, specifically geared towards using the tool to archive data such as podcasts, PDF documents, or entire websites. Issues such as using Wget to circumvent user-agent checks, or robots.txt restrictions, will be outlined as well.<br />
<br />
== Mirroring a website ==<br />
<br />
When you run something like this:<br />
<pre><br />
wget http://icanhascheezburger.com/<br />
</pre><br />
...Wget will just grab the first page it hits, usually something like index.html. If you give it the -m flag:<br />
<pre><br />
wget -m http://icanhascheezburger.com/<br />
</pre><br />
...then Wget will happily slurp down anything within reach of its greedy claws, putting files in a complete directory structure. Go make a sandwich or something.<br />
<br />
You'll probably want to pair -m with -c (which tells Wget to continue partially-complete downloads) and -b (which tells wget to fork to the background, logging to wget-log).<br />
<br />
If you want to grab everything in a specific directory - say, the SICP directory on the mitpress web site - use the -np flag:<br />
<pre><br />
wget -mbc -np http://mitpress.mit.edu/sicp<br />
</pre><br />
<br />
This will tell Wget to not go up the directory tree, only downwards.<br />
<br />
== User-agents and robots.txt ==<br />
<br />
By default, Wget plays nicely with a website's robots.txt. This can lead to situations where Wget won't grab anything, since the robots.txt disallows Wget.<br />
<br />
To avoid this: first, you should try using the --user-agent option:<br />
<pre><br />
wget -mbc --user-agent="" http://website.com/<br />
</pre><br />
This instructs Wget to not send any user agent string at all. Another option for this is:<br />
<pre><br />
wget -mbc -e robots=off http://website.com/<br />
</pre><br />
...which tells Wget to ignore robots.txt directives altogether.<br />
<br />
You can put --wait 1 to add a delay, to be nice with server.<br />
<br />
== Compression ==<br />
<br />
Wget doesn't use compression by default! This can make a big difference when you're downloading easily compressible data, like human-language HTML text, but doesn't help at all when downloading material that is already compressed, like JPEG or PNG files. To enable compression, use:<br />
<pre><br />
wget --header="accept-encoding: gzip"<br />
</pre><br />
This will produce a file (if the remote server supports gzip compression) that uses the .html extension, but is actually gzip-encoded, which can be confusing.<br />
<br />
Any vaguely modern server can sustain thousands of simultaneous text downloads, with video or large images being the big ticket items. But sites using outdated hardware, or run by habitual whiners, will complain when a site scraping uses 200 megabytes of transfer when it could have used 100.<br />
<br />
== Tricks and Traps ==<br />
<br />
* A standard methodology to prevent scraping of websites is to block access via user agent string. Wget is a good web citizen and identifies itself. Renegade archivists are not good web citizens in this sense. The '''--user-agent''' option will allow you to act like something else.<br />
* Some websites are actually aggregates of multiple machines and subdomains, working together. (For example, a site called ''dyingwebsite.com'' will have additional machines like ''download.dyingwebsite.com'' or ''mp3.dyingwebsite.com'') To account for this, add the following options: '''-H -Ddomain.com'''<br />
<br />
== Wget for Windows ==<br />
Windows users can download [http://gnuwin32.sourceforge.net/packages/wget.htm Wget for Windows], part of the [http://gnuwin32.sourceforge.net/ GNUWin32 project]. After installation, you will probably want to add it to your Path so that you can run it directly from the command prompt instead of specifying its absolute file path (i.e. "wget" instead of "C:\Program Files\GNUWin32\bin\wget.exe").<br />
<br />
These are the instructions for Windows 7 users. Prior versions should be relatively similar.<br />
#Install Wget<br />
#Right-click My Computer and select Properties<br />
#Select Advanced System Settings from the left<br />
#Click the Environment Variables button in the bottom-right corner<br />
#Under System Variables, find the Path variable and click Edit<br />
#Carefully insert the path to Wget's bin folder followed by a semi-colon. Getting this wrong could cause some nasty system problems<br />
#*Your Wget path should be inserted like this: C:\Program Files\GnuWin32\bin;<br />
#When done, click OK through all the dialog boxes you opened<br />
#The changes should apply immediately under Windows 7. Older versions may require a reboot<br />
#To test the settings, open a command prompt and enter "wget"<br />
<br />
== Parallel downloading ==<br />
http://keramida.wordpress.com/2010/01/19/parallel-downloads-with-python-and-gnu-wget/<br />
<br />
== Essays and Reading on the Use of WGET ==<br />
<br />
* [http://lifehacker.com/software/top/geek-to-live--mastering-wget-161202.php Mastering WGET] by Gina Trapani<br />
* [http://psung.blogspot.com/2008/06/using-wget-or-curl-to-download-web.html Using Wget or curl to download web sites for archival] by Phil Sung<br />
* [http://linux.about.com/od/commands/l/blcmdl1_wget.htm about.com Wget] list of commands<br />
* [http://www.delorie.com/gnu/docs/wget/wget.html#SEC_Top GNU Wget manual]<br />
<br />
[[Category:Tools]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Cassilda&diff=8811User:Cassilda2012-08-24T12:07:18Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Cassilda</p>
<hr />
<div>He's this guy, you know?<br />
<br />
I'm on #archiveteam as Cassilda.</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Famicoman&diff=8810User:Famicoman2012-08-24T12:07:18Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Famicoman</p>
<hr />
<div>Famicoman is an active member in many hacker/phreaker communities. He has had involvement one way or another in such iptv shows as Hak5, Hackcom, The Unforgiving, LocalhostTv, and TechCentric. Throughout his stay on the internet, Famicoman has sysop’d BBSes, edited ezines, moderated forums, administrated wikis, oper'd irc networks and lead up community based projects. You can currently find him on several irc networks, most notably Thinstack and MintIRC, irc://irc.thinstack.net/thinstack and irc://irc.mintirc.net/mutepoint respectively. <br />
<br />
[http://famicoman.com Famicoman.com]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Deathwatch&diff=8809Deathwatch2012-08-24T12:07:16Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by JudgeDeadd</p>
<hr />
<div>The '''Deathwatch''' is meant to be a central indicator of websites and networks that are shutting down, or to serve as an indicator of what happened to particular sites that shut down quickly. New sites should be added in chronological order, newest death date first. Forward-looking death dates should be added to the first list only. Sites large enough to warrant additional information will receive a dedicated page, linked from here.<br />
<br />
== Watchlist ==<br />
<br />
=== Pining for the Fjords (Dying) ===<br />
<br />
* '''Ponibooru''', a famous My Little Pony-related imageboard, is [[http://ponibooru.413chan.net/endofanera.html shutting down]] by August 17. All of the images themselves (but not the comments) are available to download via torrents. <br />
<br />
* [[Parodius Networking]], which hosts numerous web sites related to classic video game platforms, [http://www.parodius.com/#post_20120421T0205 will die in August 2012].<br />
<br />
* The [[Insurgency Wiki]] is a wiki with a community that created multiple guides and raids for Anonymous, in a similar manner to [[Encyclopedia Dramatica]]. It's status has always been unclear, with many mirrors coming and going. But as of Feb. 22, 2012, the last mirror, Partyvan.info, looks like it has some damning database error. Just in case, the Bibliotheca Anonoma has made a full backup, including all available images available.<br />
<br />
* "[[convore.com]]" will [http://blog.convore.com/post/17951919109/convore-shutting-down-april-1st shut down in april 2012]. The site hosts irc conversations, and involves a lot of javascript.<br />
<br />
* The '''Centralstation Community''' [http://community.thisiscentralstation.com/_Central-Station-v2-Q38As/blog/5449967/126249.html has closed]. The site is a UK-based social network for artists and creatives that provides hosting for content and portfolio. Users are being advised to back up their work as the new version of their platform will rely on existing media hosting sites like Flickr, Vimeo, and Soundcloud.<br />
<br />
* '''Apple''' '''[[MobileMe]]''', '''[[MobileMe#iDisk | iDisk]]''', '''[[MobileMe#web.me.com / iWeb | iWeb]]''', and included services. This major website and these services will shut down in [http://support.apple.com/kb/HT4597 2012], simply because web hosting is boring and they want to focus on the exciting "iCloud".[http://www.apple.com/mobileme/transition.html][http://support.apple.com/kb/HT4597]<br />
<br />
* '''[[BBC websites]]''': According to [http://adactio.com/journal/4336/ "Erase and rewind"] (Jeremy Keith) and [http://www.guardian.co.uk/media/2011/jan/24/bbc-online-website-closures "BBC online website closures"] (Guardian UK), the BBC is planning on deleting 170 of its in depth programming-related websites. This isn't 170 pages, but an indeterminate number, possibly approaching thousands of pages. Over at [http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_%28idea_lab%29&diff=prev&oldid=413634746#BBC_imminent_deletion_of_170_of_its_websites_-_Wikipedia_plan.3F Wikipedia], work to identify all affected articles happened, resulting in [http://en.wikipedia.org/wiki/Wikipedia:List_of_doomed_but_actively_referenced_BBC_links a list of doomed links] and [http://en.wikipedia.org/wiki/Wikipedia:List_of_articles_with_doomed_BBC_links a list of articles with doomed links]. There is a free torrent listed at the end of [http://bengoldacre.posterous.com/nerd-saves-entire-bbc-archive-for-399-you-can Ben Goldacre's post], but it's not set up for either local use or installing on a server, and seems to be missing a large amount of material.<br />
<br />
* Yahoo is planning to close '''[http://www.foxytunes.com/ FoxyTunes]''' "soon" in favour of Yahoo! Music. Thankfully, this doesn't mean much, as FT always had a small following and no significant data is stored there, but it does mark yet another closure of a Yahoo product for seemingly no reason.<br />
<br />
* [http://hosting.ign.com/faq.php IGN has announced] that it will shut down all '''IGN hosting''' on August 31, 2009. This includes fan sites hosted on GameSpy and ClassicGaming.<br />
<br />
* '''Google Buzz''' will be shutting down soon. Luckily Google released a tool to download your content from it called [[Google Takeout]]: https://www.google.com/takeout/ <br />
<br />
* Besides Buzz, Google will be shutting down many of its other services, such as Aardvark, Sidewiki, and others: http://googleblog.blogspot.com/2011/09/fall-spring-clean.html<br />
<br />
* '''[http://kasabi.com Kasabi]''', a data publishing platform created by [http://talis.com Talis] was [http://blog.kasabi.com/2012/07/09/shutting-down-kasabi/ announced] to be closing on July 30, 2012. While the service has only been around for ~2 years it represents a unique look at services for Linked Data, and contains a variety of datasets. Kasabi has a [http://blog.kasabi.com/2012/07/16/archive-of-datasets/ blog post] that announces the availability of datasets contained in Kasabi to ease archiving.<br />
<br />
<br />
=== Pre-emptive Alarmbells (Likely To Die) ===<br />
<br />
* The Polish social network '''Grono.net''' has disappeared, replaced by a file hosting service '''grono.net.pl''' on July 1, 2012. Most content from the old site was supposed to be migrated, but, according to a message on the main page, technical difficulties have delayed the migration by one or two weeks. It's getting increasingly late...<br />
<br />
* '''[http://gamecorner.pl Gamecorner.pl]''', a Polish video game news portal, has been closed. Is is unknown how long its content will remain online. In any case, it seems that ''all'' the user blogs hosted there are gone without warning. <br />
<br />
* '''[[dl.tv]]''' [http://dl.tv] There is no new tech podcast on here for over a year. Good idea to start backing up all podcast on this site. Same for Crankygeeks. [http://www.crankygeeks.com/]<br />
<br />
* '''[[Google Video]]''' threatened to remove all hosted videos with two weeks' notice in April 2011. It backed down after criticism and an archive effort by the Archive Team.<br />
<br />
* '''[[Citizendium]]''''s finances {{url|1=http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-11-08/News_and_notes|2=running low}}. Projecting from its current funding, it looks like it [http://en.citizendium.org/wiki/CZ:Donate may not survive 2011].<br />
<br />
* '''[[cyberpunkreview.com]]''': 80s science fiction fansite and community {{url|1=http://cyberpunkreview.com/}} hasn't seen much staff activity in a long time, although the forums are going strong. UPDATE: Looking active again. [[User:Aggroskater|Aggroskater]] 08:26, 19 March 2012 (EDT)<br />
<br />
* Archive Team is declaring '''[[Yahoo!]]''' no longer a trustable entity. Prove us different, Yahooligans. Or... don't.<br />
<br />
* AOL plans to either sell or close '''[[Bebo]]''' by June. {{url|1=http://www.clickz.com/3640014}}, {{url|1=http://www.hypebot.com/hypebot/2010/04/aol-says-bye-bye-bebo.html}}<br />
<br />
* '''[[WikiLeaks]]''' ({{url|1=http://wikileaks.org/}}) has an uncertain financial situation, and the site was inaccessible for some time in 2010.<br />
<br />
* '''[[FriendFeed]]''' ({{url|1=http://friendfeed.com/}}) has been purchased by [[Facebook]], leaving FriendFeed users uncertain as to its future.<br />
<br />
* Going to call this one before it even starts, friends: '''[https://www.legacylocker.com/ Legacy Locker]''' promises lifetime control of your data and return of your data to loved ones for just $300 for "lifetime", or $30/year. [http://www.washingtonpost.com/wp-dyn/content/article/2009/03/10/AR2009031001211.html] Archive Team says to just say No.<br />
<br />
* '''[[The Pirate Bay]]''' ({{url|1=http://www.thepiratebay.org/}}) still having persistent legal problems. The tracker went down in November, but the site still serves torrents and magnet links. If a torrent is lost, it becomes impossible to connect to other computers distributing the shared files. Considering that there are links to TPB on '''THIS VERY PAGE''', this is pretty dang important. Thankfully, the magnet links and entire siterips have now been made, though keeping them updated is sure to be a pain.<br />
<br />
* '''[http://www.ning.com/ Ning]''' has laid off 40% of staff and seems to be running out of money [http://techcrunch.com/2010/04/15/nings-bubble-bursts-no-more-free-networks-cuts-40-of-staff/]. There is certainly some networks worth archiving among the 2 million networks[http://blog.ning.com/2010/01/2-million-ning-networks.html] they host. Grouply[http://blog.grouply.com/grouply-welcomes-ning-networks/] and Posterous[http://blog.posterous.com/posterous-commits-to-building-a-ning-blog-imp] say they are going to offer migration tools.<br />
<br />
* '''[http://ompldr.org/ Omploader]''', an anonymous file upload site, has announced that they are about $2500 in the hole on hosting costs, and that there is possibility of their shutting down if donations do not improve. It stands to reason that there are some files among their database that are worth saving. An attempt to contact the administrator for more information and to be given a dump of the site was made, and he responded saying he'd be happy to rsync a copy of the data after some legal issues have been settled.<br />
<br />
* '''[[ShoutWiki]]''', a wiki-farm which had been offline for part of 2011, became unresponsive at the beginning of March 2012; their main wiki and a select few others came back online in May 2012 but most wikis on the site still return "database not found". The forums on their main site do indicate a server crash and data loss, severity unknown. ArchiveTeam has [http://www.archive.org/details/shoutwiki.com 2.5GB of backups of ShoutWiki] content from 2011, should worse come to worst.<br />
<br />
* '''[http://go.to/ go.to]''', an URL shortener, has all of its domains on sale on Sedo. No official word just yet, though.<br />
<br />
=== Other Endangered Species ===<br />
<br />
* '''MUDs (Multi User Dungeons)''' are [http://www.offworld.com/2009/01/mud-history-going-down-wikis-m.html losing their history due to the notability guidelines of Wikipedia]. If you want to write wiki-style articles about MUDs, you might want to consider contributing them to one of the existing MUD wikis ([http://www.mudpedia.org Mudpedia], [http://mud.wikia.com MUD Wiki], etc.) instead. <br />
* '''[http://www.astronautix.com Encyclopedia Astronautica]''' is the most comprehensive collection of the history of space travel. '''Period.''' Seriously, the official NASA history folks will refer you this website if they can't answer your questions. However, Mark Wade (the sole creator/maintainer) abandoned his blog at the end of 2007, and the Encyclopedia has not been updated since May of 2008, despite much happening in the space exploration world since then.<br />
* '''All of the 1UP Network''' and related properties were bought by UGO recently, and should be watched carefully. [http://multiplayerblog.mtv.com/2009/01/06/egm-closed-ziff-lays-off-30/]<br />
* [http://www.bbc.co.uk/h2g2/ h2g2] - "H2G2 is a constantly expanding, user-generated guide to life, the universe and everything. The site was founded in 1999 by Hitchhiker's Guide to the Galaxy author Douglas Adams." There are plans to buy h2g2 from the BBC (http://www.bbc.co.uk/dna/h2g2/brunel/A80173361).<br />
<br />
=== Just When You Least Expect It ===<br />
<br />
* Archive Team keeps a list of [[Fire_Drill|Healthy Sites]] that could be fine today and not so hot tomorrow. We focus on ways to back your personal data off these sites so you don't put yourself at unnecessary risk.<br />
<br />
=== Eleventh Hour Reprieves and Reanimations ===<br />
<br />
* '''[[Berlios.de]]''' will [http://www.berlios.de/ shut down at end of 2011]. The site hosts thousands of open source software projects (git, svn, bzr, mailing lists, bug tracking, etc). [http://developer.berlios.de/docman/display_doc.php?docid=2056&group_id=2 Instructions for exporting a project.] Berlios is still open and they are now [http://joinup.ec.europa.eu/news/german-open-source-development-site-berlios-joins-sourceforge partnered with sourceforge] to keep things running.<br />
<br />
* '''[[Delicious]]'''[http://www.delicious.com] will be [http://daringfireball.net/linked/2010/12/16/delicious.php shutting down soon]. The whole team was let go yesterday - 15 December 2010. [http://tech.slashdot.org/story/10/12/16/2220225/Yahoo-To-Close-Delicious Slashdot link]. Delicious was acquired from Yahoo! in early 2011 by AVOS however all the prior content is gone.<br />
* '''Cli.gs''', another URL shortening service, announced closure: "On Sunday, 25 Oct 2009 at 12:00:00 GMT, the service will stop accepting new short URLs and will stop logging analytics."[http://blog.cli.gs/news/cligs-shutting-down] In December 2009, it was announced that the "social bookmarking" site Mister Wong has acquired cli.gs and are keeping it running.[http://blog.cli.gs/news/mister-wong-acquires-cligs] All aboard the [[TinyURL]] project. <br />
* '''tr.im''' became the largest URL-shortening service to announce closure[http://blog.tr.im/post/159369789/tr-im-r-i-p] - then hastily recanted[http://blog.tr.im/post/160697842/tr-im-resurrected] and came up with a new solution[http://blog.tr.im/post/165049236/tr-im-to-be-community-owned]. All aboard the [[TinyURL]] project, and [http://www.301works.org/post/207681240/301works 301works.org].<br />
* '''Home of the Underdogs''' went under on Feb 9th[http://flashofsteel.com/index.php/2009/02/13/rip-hotu/]. There has been some passed along words by the site's owner, now working at an NGO, that an attempt to bring it back may happen. (She definitely has backups of the site.) A community-driven effort to revive the site is currently underway [http://www.hotud.org]. Backups were restored, and the remaining files (1,000+) collected from the community. As of Jan 4th 2010, HOTU is reporting that files are back online [http://www.hotud.org/component/content/article/25133-files-online-and-import-done]<br />
* '''[[JPG Magazine]]''' announced it would shut down on January 5, 2009 [http://jpgmag.com/blog/2009/01/jpg_magazine_says_goodbye.html], but the site lives [http://jpgmag.com/blog/2009/02/an_exciting_future_for_jpg.html lives on under new ownership]. Feel free to download the [http://thepiratebay.org/torrent/4624703/ torrent]<br />
*'''Filefront.com''' is closing up shop [http://farewell.filefront.com/]. The site will be suspended on March 30, 2009. 1.5 Million files and 48+ TB of space gone just like that. '''UPDATE''' As of April 2, 2009, it looks like there may have been an 11th hour reprieve for Filefront. According to a message reportedly from the original founders of the service [http://welcome.filefront.com/], the site has been re-acquired by them in order to prevent its proposed shuttering.<br />
* '''[[Word Count Journal]]''' ({{url|1=http://www.wordcountjournal.com/about}}) is shutting down on June 11, 2011 '''UPDATE''' The site is fully up and running. (checked on October 21, 2011) '''UPDATE2''': Non-functional, but the website is up with this notice "Word Count Journal is no longer being supported." (checked on January 26th, 2012)<br />
<br />
== Dead as a Doornail ==<br />
<br />
===2012===<br />
* The popular file hosting service '''Megaupload''' has been shut down in January 2012; with it, '''Megavideo''' too is gone. It was mainly used for copyright infringement, but lots of perfectly regular files were hosted on it.<br />
<br />
===2011===<br />
* '''[[Google Labs]]''' (http://labs.google.com) closed on (somewhen august/sebtember?) http://www.pcmag.com/article2/0,2817,2388881,00.asp#fbid=7kZ39-1XQUH and many great/experimental one of a kind tools vanished. Among many others "google sets" that had been around since a long time, "City Tours" some includeing user generated content and the exciting "google squared" http://4.bp.blogspot.com/_ZaGO7GjCqAI/SibWbewOy5I/AAAAAAAAQBM/8lb7UA6AWPY/s640/google-squared-species.png that was an approach to pass more artificial intelligence to the user than conventional searchengines (compareable to wolframalpha) but seemingly based on a bigger/vast pool of data just like standard google searchresults). Since there is hardly any obvious rationale(?) for closeing down "Google Labs" it pictures google as beeing either less supportive or even hostile to new inoventions and less responsible with usergenerated content or more secretive about their ongoing projects than one might thought before or than Google might was before indeed. 1 Jan 2011 [[User:Whatsgoingonwithgoogle|Whatsgoingonwithgoogle]] 18:06, 17 October 2011 (UTC)<br />
<br />
* The wiki hosting site '''wik.is''', hosted by MindTouch, shut down on the first week of January 2011; the explanation being that "in order to continue to support the growing needs of our MindTouch Express users, we are offering MindTouch Cloud", which "opens up additional features and functionality that are not available in Wik.is.". The only way you'd know all that is if you receive a warning e-mail from MindTouch. They offer to keep your site running by "upgrading to our paid Cloud version by [http://campaigns.mindtouch.com/Wik.isDecomissioningMigrationInterest.html filling out this short form.]"<br />
<br />
* '''[[ProHosting]]''' (http://free.prohosting.com) closed hosted sites on 1 Jan 2011<br />
<br />
* '''The Sims Carnival''': January 17th, 2011[http://www.simscarnival.com/games/CarnivalMonkey/35068/The-Sims-Carnival-Says-Goodbye]<br />
<br />
* '''[http://team.gaia.com/blog/2010/3/important-gaia-announcement Gaia Community]''' shut down at the end of March. <br />
<br />
* '''[http://ghost.cc/ Ghost Cloud Computing]''' became a ghost of itself[http://ghost.cc/home/SignUp.jsp].<br />
<br />
* '''Microsoft''' closed '''Windows Live Spaces''' on March 16, 2011. Spaces owners had the option to migrate their blogs to '''WordPress''' or to make copies. As of January 4, 2011, they could no longer edit their existing Spaces.[http://explore.live.com/windows-live-spaces-help-center]<br />
<br />
* '''[[Yahoo! Video]]''' shut down on March 31st, 2011 and was reborn as a video portal.<br />
<br />
* '''[[Encyclopedia Dramatica]]''' shutdown on 16. April 2011 without warning. Ongoing reconstruction Efforts. A lot of Images and Articles are probably lost. (The replacement OhInternet is a very strongly sanitized Version of ED.) <s>ED is claiming that they are in danger of shutting down. Despite the controversial nature of many articles hosted on the wiki, this would be a big loss of historical records.</s><br /><font color="red">A lot of the Images and Pages are still missing. Help appreciated.</font><br />
<br />
* [[Yahoo]]! has {{url|1=http://techcrunch.com/2011/02/24/yahoo-to-shut-down-mybloglog-on-may-24/|2=announced}} that '''[[MyBlogLog]]''' will be closed on 24 May 2011. '''UPDATE:''' Yup.<br />
<br />
* '''[[Prodigy Pages]]''' shut down on June 1,2011.<br />
<br />
* '''[[Forums.starwars.com]]''': {{url|1=http://www.starwars.com|2=StarWars.Com}} {{url|1=http://forums.starwars.com/ann.jspa?annID=3|2=announced}} the closure of their {{url|1=http://forums.starwars.com|2=forums}} on June 3, 2011. (Forum will lock on 29 April 2011) {{url|1=http://theforce.net/latestnews/story/StarWarscom_Forums_Shutting_Down_In_June_137497.asp|2=tf.n report}}<br />
<br />
===2010===<br />
* '''[http://machinima.com Machinima.com]''' was reworked in December 2010, and by "reworked" we mean massacred. Most notably, the forums were deleted, as well as tons of older articles. <br />
* The '''[http://www.symbian.org/ Symbian Foundation]''' will shut down its websites, Twitter account, Facebook page, bug trackers and remove access to its source code on 17 Dec 2010[http://www.engadget.com/2010/11/27/symbian-foundation-axing-websites-on-december-17th-source-repos/][http://developer.symbian.org/wiki/Symbian_Foundation_web_sites_to_shut_down].<br />
* '''[http://itdied.com/ It Died]''' by Glenn Fleishman. a site dedicated to indicating sites that have died, itself died. (Keep the [http://itdied.com/atom.xml RSS Feed] around in case that changes, though).<br />
* '''isweb lite''', the Japanese Geocities, shut down on October 31. Thousands of personal homepages of artists and illustrators were deleted forever. A tiny sample of the pages deleted: [http://togetter.com/li/64058] '''isweb''' itself (paid hosting!) will shut down in May 2012. [http://portal.faq.rakuten.co.jp/app/answers/detail/a_id/15387/]<br />
* '''[http://closing.vox.com/ Vox]''' shut down at the end of September 2010.<br />
* '''[http://storytlr.com/ Storytlr]''', a lifestreaming site, stopped hosting March 1st 2010.<br />
* '''[http://platinum.ac Platinum]''', once a popular Finnish web site associated with electronic dance music, clubbing/raving, and the other related things was closed in March after been running for years. All the content posted to the forums of the site was, however, obtained and made available by [http://klubitus.org Klubitus], another related portal popular in Finland.<br />
* '''[http://www.kidradd.com Kid Radd]''' was a notable and quite popular webcomic which vanished when AT&T discontinued their Worldnet service. Thankfully, an archive is available, e.g. [http://tangent128.name/depot/kid_radd.zip here].<br />
* '''[http://www.brightfuse.net BrightFuse]''' was a small social network started as a side venture by CareerBuilder.com in August 2009. It was quietly shutdown November of 2010 without much fanfare. At its height it has 100k users.<br />
<br />
===2009===<br />
<br />
* Google acquired '''[http://etherpad.com Etherpad]''' on 4th December, 2009 and immediately [http://etherpad.com/ep/blog/posts/google-acquires-appjet announced] a March 2010 content deletion date. After community pressure, Google has decided to [http://etherpad.com/ep/blog/posts/etherpad-back-online-until-open-sourced open source the Etherpad codebase], keeping the service alive until then. The site closed down shortly after. Fortunately there are now are [http://www.google.com/search?q=etherpad+alternatives numerous] [http://www.google.com/search?q=etherpad+clone alternatives].<br />
<br />
* '''favrd''', a website that aggregated favorite tweets from twitter, abruptly shut down on '''December 6, 2009''' with absolutely no warning, killing off thousands of highlighted entries added by group-consensus over significant months. As a reward for their efforts, founder Dean Allen wrote this helpful message: ''"Alas, stars on Twitter have become mere take-out menus hung on the doors of other restaurants. There are still lots of clever and funny things to read every day, but finding these is no longer a challenge â you already follow your sources. Sites like this one now serve mainly as fuel for emotional up-fuckedness in the guise of a game. Just an idea: next time you see something you like, write the person who made it a note telling them so. Even better, explain why. Take care!"'' Advice to people who want to work with Dean Allen's projects in the future: don't.<br />
* '''here.is''' seems to permanently off-line. It ceased to re-direct email for some time ago and as per 11-23-09 it doesn't redirect even URLs any longer.<br />
[[Image:Encarta.jpg|right|300px|Discontinuedpedia]] <br />
* '''Microsoft Encarta''', the online encyclopedia with a 15+ year history, is being shut down. The US version will shut down on October 31, 2009 and the Japanese version on December 31, 2009. [http://www.reuters.com/article/CMPTRS/idUSLV28230720090331]<br />
* '''[[GeoCities]]''': Shock! Repeat Offender '''[[Yahoo]]''' announced that it would close GeoCities "later this year...We'll send you more details this summer." [http://help.yahoo.com/l/us/yahoo/geocities/geocities-05.html]. The plug was pulled on October 26th 2009. See the [[Geocities]] project page for more details.<br />
* '''Microsoft's SoapBox''' has announced it is getting off said soapbox on August 31, 2009. [http://arstechnica.com/microsoft/news/2009/07/soapbox-microsofts-youtube-dies-on-august-31-2009.ars]. <br />
* '''ArchNacho's & TortillaGodzilla's Quality ROMs''', a site that hosted ROMs for NES, SNES, and Genesis games, which has announced its effective death back in January of 2006, is now finally completely inaccessible, both on its original domain (http://www.qualityroms.com), and on the site that the domain masked (http://home.no.net/qualrom/). Archive.org has [http://web.archive.org/web/*/http://qualityroms.com mirrors] of the site up through August 30, 2007, which is after all updates to the site ceased. All ROMs hosted on QualityRoms are included in the mirror and can be downloaded from there.<br />
* '''Microsoft's Popfly''' [http://popflyteam.spaces.live.com/blog/cns!51018025071FD37F!336.entry] pops off into nowhere on August 24, 2009.<br />
* '''Yahoo! 360''' announces [http://blog.360.yahoo.com/blog-1qCkw2Ehaak.hdNZkEAzDrpa4Q--?cq=1] that they are closing up shop on July 13, 2009. Of course, you can still register an account but that's the first thing you're told.<br />
* '''Imeem''', a site for sharing music and convincing yourself that what you're hearing is good, [http://blog.imeem.com/2009/06/25/simplifying-imeem/ announced] on June 25, 2009 that they were "simplifying" things and deleting all user-generated photos and videos uploaded by users. They gave everyone '''five days''' to get their photos off, and then extended it to ''twenty days'' from the ensuing hue and cry. The uploaded videos had no way to extract them back.<br />
* '''Rejaw''', a microblogging platform, has announced that it will be shutting down on May 31 2009 [http://rejaw.com/rejaw/shout/OOfs2wUaLql]. It's gone.<br />
* '''[http://www.jumpcut.com Jumpcut.com]''' became the latest example of Yahoo!'s awesome respect for history and data, announcing the closure of the video hosting and editing site, for June 15, 2009. A software utility has been released to allow you to download the movies from Jumpcut. Otherwise, you are not in great shape - Yahoo says you can move your videos to Flickr, but Flickr cuts off at 90 seconds. A lot of homemade video is going to disappear.<br />
* '''MSN QnA Beta''' closed on May 21 [http://liveqna.spaces.live.com/blog/cns!2933A3E375F68349!2244.entry]<br />
* '''[http://www.coghead.com Coghead]''', " a web-based service for building and hosting custom online database applications and a software as a platform 'utility computing' company", announced it had closed up on February 20, 2009, and that the site would go down permanently on April 20, 2009. [http://blogs.zdnet.com/collaboration/?p=349]. It did.<br />
* '''[http://furl.net/ Furl]''' was a social bookmarking service that had been around since 2004. It was acquired by [http://diigo.com/ Diigo] (announced on March 9), allowed people to opt into transferring their bookmarks to Diigo, and shut down on April 17. [http://blog.diigo.com/2009/03/16/welcome-furl-users/ Diigo blog post]; [http://www.techcrunch.com/2009/03/09/diigo-buys-web-page-clipping-service-furl-away-from-looksmart/ Techcrunch post].<br />
* '''[http://www.spiralfrog.com Spiralfrog]''', "a FREE service that lets you download over 3 million songs and videos, legally and safely", pulled up stakes in the night and completely shut down on March 20, 2009. [http://arstechnica.com/web/news/2009/03/ad-based-music-service-spiralfrog-croaks.ars] Things looked so promising in 2006: [http://arstechnica.com/old/content/2006/08/7611.ars] Oh, and sadly, all your music you downloaded from them will stop working within 30 days or less. [http://arstechnica.com/old/content/2007/09/spiralfrog-debuts-with-free-ad-supported-music-downloads.ars]<br />
[[Image:HP upline goes offline.jpg|right|300px|Did we say upline? We meant offline.]]<br />
* It doesn't get more ironic than this: '''[https://www.upline.com/ Upline]''', a HP-owned online backup service, is being shut down.[http://news.cnet.com/8301-17939_109-10173136-2.html?part=rss&subj=news&tag=2547-1_3-0-5] ''They almost immediately turned off the backup process,'' and then announced all your restorable data would go offline on March 31, roughly 30 days after announcement. Surprise!<br />
* '''[[Yahoo_Briefcase|Yahoo Briefcase]]''', a positively ancient site run by Yahoo that provided you with 25 free megabytes of storage space for your junk, sent a mail to what were likely years-old contact addresses to tell them they had a little more than a month to get their files out, March 30, 2009. After that, the files would be deleted. What, Yahoo doesn't have a spare memory stick to store what must be the amount of files in this service for the next year?<br />
* '''Yahoo! Farechase''', an airline fare aggregation and searching site, was shut down on March 25, 2009. It had previously been it's own company, founded in 1999, and purchased by Yahoo! in 2004. [http://news.cnet.com/Yahoo-buys-travel-company/2100-1032_3-5300561.html]<br />
* '''[http://seattlepi.nwsource.com/ The Seattle Post-Intelligencer]''' was [http://seattlepi.nwsource.com/business/395463_newspapersale10.html put up for sale], but found no buyer, and the print edition stopped on March 17th 2009 after 146 years. [http://www.thenewstribune.com/news/columnists/zeeck/story/591181.html] Initially, reports indicated it would shut down the website as well as the paper, but a plan was apparently in place to run a "skeleton crew" on an internet-only site, which continues to operate.<br />
* '''[http://www.videosift.com Videosift]''' had a combination database and backup failure, losing: "All votes, ever. All member usernames who registered later than around 12 months ago. All member rankings. Your member profile info (e.g., bio, favorite sift, etc.), if any. All activity that happened on the site yesterday, March 11." This is unlikely to kill the site, but an awful lot of data was lost.<br />
* '''[http://www.scoopt.com/ Scoopt]''', a "citizen journalism" site run by Getty images to allow the uploading of images by citizen journalists and the chance to be licensed to news organizations, announced they would no longer take any new imagery after February 6, 2009, and will shut down completely on March 6, 2009. Some content uploaders "may" be contacted about being absorbed into the main Getty site.<br />
[[Image:20090227.jpg|right|300px]]<br />
* '''The [http://www.rockymountainnews.com/ Rocky Mountain News]''' has shut down as of February 27, 2009. [http://www.rockymountainnews.com/news/2009/feb/26/rocky-mountain-news-closes-friday-final-edition/] We're watching to see what happens with the website (and the material, and the newspaper itself). With a 150 year history, there's a lot of backstory, and how this chronicler of history will end up, so too will many others. There is an excellent documentary about the last days of the Rocky Mountain News [http://www.vimeo.com/3390739 here].<br />
*'''Electronic Gaming Monthly''' has recently shut its doors. [http://multiplayerblog.mtv.com/2009/01/06/egm-closed-ziff-lays-off-30/]<br />
*'''[http://culture11.com/home Culture11]''' ran out of money.[http://www.patrolmag.com/scanner/1263/culture11-is-over]<br />
* '''[[Lycos Europe]]''' shut down their '''Tripod''' hosting service on February 28, 2009. [http://www.washingtonpost.com/wp-dyn/content/article/2009/01/18/AR2009011800224.html] [http://www.paidcontent.co.uk/entry/419-lycos-europe-killing-tripod-customers-warned-to-back-up/] Note that Lycos Europe are distinct from Lycos.com. '''[[Lycos Europe]]''' is also shuttering the social networking site '''Jubii''' as of February 15, 2009. [http://www.techcrunch.com/2009/01/18/lycos-kills-jubii-while-theyre-at-it/] A Danish version of the site will remain open for the time being.<br />
* '''Windows Live''' shut down the '''MSN Groups''' on February 23. They extended their original date from February 21st to give Group owners the weekend to prepare. [http://windowslivewire.spaces.live.com/Blog/cns!2F7EB29B42641D59!34861.entry?sa=503427140]<br />
* '''[http://ma.gnolia.com/ ma.gnolia.com]''' had a catastrophic disk corruption/failure on January 31, 2009. From the message on the main site: ''"As I evaluate recovery options, I can't provide a certain timeline or prognosis as to to when or to what degree Ma.gnolia or your bookmarks will return; only that this process will take days, not hours."'' Ma.gnolia had an excellent export feature... hope you used it and did the backups they didn't!<br />
* '''[http://dominomag.com/ Domino Magazine]''', a style/interior design magazine, announced that they were shutting down on January 28, 2009. [http://mydecofile.dominomag.com/ My Deco File], one of the site's heavily used social bookmarking features (somewhat like delicious for images) will remain up for a few weeks to allow users to save their stuff.<br />
* '''Yahoo Pets''' was shut down and redirected with absolutely no notice around January 27, 2009. [http://blog.dogster.com/2009/01/28/yahoo-quietly-shutters-yahoo-pets-grin/]<br />
* '''[[totse]].com''' [http://www.totse.com/ closed its doors] on January 17, 2009. As of Jan 20th, a mirror [http://totse.danladds.com/ exists], alongside a [http://totse.danladds.com/text/ repository of the totse text files].<br />
* '''[[Ficlets]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/02/ficlets-will-be-shut-down-permanently/]<br />
* '''[[Circavie]].com''' (owned by AOL) has announced they are closing on January 15, 2009. [http://www.peopleconnectionblog.com/2008/12/03/circavie-will-be-shut-down-permanently/]<br />
* '''Several Google services''' have shut down. [http://www.readwriteweb.com/archives/google_giveth_and_it_taketh_away.php] Most importantly, Google Video stopped accepting new uploads (to avoid competition with Google-owned YouTube), and Google Catalog Search was erased.<br />
* '''[[Co.mments]].com''' closed down on January 11, 2009.<br />
* '''[[AOL_Pictures|AOL Pictures]]''' said so long on January 9, 2009. To their credit, you can still yank your stuff into other photo services until June of 2009. (At least, according to their goodbye letter.)<br />
<br />
===2008===<br />
<br />
* [http://blogs.zdnet.com/BTL/?p=11227 Overview of 2008 Technology News]<br />
<br />
''Biggest Botched Shutdowns of 2008''<br />
* '''[http://www.peopleconnectionblog.com/2008/11/06/hometown-has-been-shutdown AOL Hometown]''' (owned by AOL) was officially killed on October 31, 2008. [http://ascii.textfiles.com/archives/1617 Jason wrote about it.]<br />
[[Image:Stayclassyaol.png|thumb|right|470px|The full extent of warning AOL gave about shutting down Hometown.]]<br />
* '''Digitalrailroad.net''', a photo hosting site, gave their users a 24-hour eviction notice on October 27, 2008. They shut down 10 hours after the 24-hour notice. [http://news.cnet.com/8301-17939_109-10078042-2.html]<br />
<br />
''Other deaths of 2008''<br />
<br />
* '''[http://www.lively.com/goodbye.html Lively]''', a 3D Avatar space experiment, was killed in a really crappy way by Google on December 31, 2008.<br />
* '''[http://pingmag.jp/ Pingmag]''', the magazine from Tokyo about "Designing and Making things," simultaneously rang in the new year and checked out of existence on December 31, 2008.<br />
* '''[http://blog.mixwit.com/ Mixwit]''' said goodbye on December 27, 2008. [http://news.cnet.com/8301-17939_109-10126057-2.html]<br />
* '''[http://www.castlecops.com/ Castle Cops]''' put away their badges on December 23, 2008. [http://www.idf50.co.uk/clubhouse/computer-room/15996-castle-cops-closed-down.html]<br />
* '''[[Google Research Datasets]]''', shut down on December 19(?), 2008. [http://blog.wired.com/wiredscience/2008/12/googlescienceda.html]<br />
[[Image:Final image 01.png|400px|right|thumb|The last person at Yahoo! Kickstart turning off the lights.]]<br />
* '''Yahoo! Kickstart''', a social network for college students revealed in 2007 [http://mashable.com/2007/08/30/yahoo-kickstart/] got expelled on about December 18, 2008. [http://www.techpluto.com/yahoo-kickstart-shutdown/]<br />
* '''Flip.com''', a social network for teenage girls, shut down on December 16, 2008. Users were advised to print out their digital scrapbooks as backups. [http://news.cnet.com/8301-1023_3-10112021-93.html]<br />
* '''[http://pownce.com/ Pownce]''' was closed on December 15, 2008.<br />
* '''[http://getsatisfaction.com/iwantsandy/topics/a_fork_in_the_road_an_important_announcement_about_i_want_sandy I Want Sandy]''' [http://www.webcitation.org/5eFA58kqN (WEBCITE)] was shut down on December 8, 2008. A lot of people complained about this one, while others thanked the site for shutting down and wished the founder well! <br />
* '''[http://live.yahoo.com/ Yahoo Live!]''' died on December 3, 2008. [http://news.cnet.com/8301-13515_3-10081486-26.html]<br />
* '''[http://ourworld.cs.com/sfrederick2/index.htm?f=fs|Compuserve OurWorld]''' slipped into history on October 31, 2008.<br />
* '''[http://blogrush.com BlogRush.com]''' failed to provide bloggers with the traffic they so desperately desired, and the creator admitted on October 29, 2008 that his 4AM idea may not have been so brilliant. [http://mashable.com/2008/10/29/blogrush-shutdown/]<br />
* '''[http://wallop.com/ Wallop]''', Microsoft's attempt at starting a social network, died on September 18, 2008. All that remains is a few Facebook apps. [http://news.cnet.com/8301-13577_3-10041856-36.html] [http://www.techcrunch.com/2008/09/15/wallop-takes-a-leap-into-the-deadpool/]<br />
* '''Yahoo! Mash''', a social networking site, became mush on September 28, 2009, after 30 days warning. [http://mashable.com/2008/08/28/yahoo-mash-has-been-quashed/] <br />
* '''ScribbleWiki''' wikis go offline.<br />
* '''Virtual Magic Kingdom''' [http://www.intercot.com/discussion/showthread.php?t=130548 closed its gates] on May 21, 2008. [http://www.virtualworldsnews.com/2008/04/disneys-virtual.html] The amount of broken hearts and anguish over this move was amazing, and a warning sign to any family-oriented site that encourages families to join up.<br />
* '''[http://en.wikipedia.org/wiki/Think_Secret Think Secret]''' was killed by Apple and shut down on February 14, 2008. [http://blog.wired.com/business/2007/12/apple-and-think.html]<br />
* '''Uber.com''' was a social blog site that died. [http://news.cnet.com/8301-13577_3-10052301-36.html]<br />
* '''Social.fm''' couldn't stand up to Last.fm, and died. [http://news.cnet.com/8301-13577_3-10005554-36.html]<br />
* '''Brijit.com''', a news aggregation site, closed on May 15, 2008. It might be closed for good. [http://news.cnet.com/8301-13577_3-9945059-36.html]<br />
* '''Yahoo! Design''', a showcase of designing and information aesthetics related to the Yahoo! properties, got revised into oblivion in February, 2008 as part of a 1,000 employee layoff. [http://infosthetics.com/archives/2008/02/rip_yahoo_design_closed_down.html]<br />
<br />
===2007===<br />
<br />
* '''Yahoo! Podcasts''', a Podcast searching site founded in October 2005 [http://www.ysearchblog.com/2005/10/09/listen-to-the-internet-with-yahoo-podcasts/], was closed with no explanation on October 31, 2007. [http://searchengineland.com/yahoo-podcasts-to-close-the-sorry-state-of-podcast-search-12288]<br />
* '''[http://oink.cd/ OiNK's Pink Palace]''' Music Bittorrent tracker site with huge user community which cared greatly about digital content and music. Would have been a great resource for the industry to research. Shutdown October 23, 2007. [http://www.wired.com/entertainment/music/news/2007/10/oink]<br />
* '''[http://jam.bbc.co.uk/ BBC Jam]''' was [http://news.bbc.co.uk/2/hi/uk_news/education/6449619.stm suspended] March 20, 2007 and [http://www.guardian.co.uk/media/2008/feb/28/bbc.digitalmedia will not be coming back].<br />
* '''Yahoo! Photos''', a photo sharing service by Yahoo!. Tools: [http://smart-techie.com/yahoo/ Download Hi Resolution Yahoo! Photos] by [http://smart-techie.com/web/ Rohit Sud], [http://kentbrewster.com/download-yahoo-photos/ Download Yahoo! Photos] by [[Kent Brewster]], and [http://yandao.com/yahoograb/ Yahoo! Photos Grabber] by [http://yandao.com Yandao.com]<br />
<br />
===2006===<br />
<br />
===2005===<br />
<br />
* http://IUMA.COM (Internet Underground Music Archive), of Santa Cruz, California, the actual first website to offer free hosting of bands including MP3 files of music offered by the bands, was mostly archived by John Gilmore before going down. At least one IUMA founder now has a copy of that archive. This ~800GB collection has been uploaded to an archiveteam staging server.<br />
<br />
===2004===<br />
<br />
===2003===<br />
<br />
* http://mp3.com went down. Much of it was archived by John Gilmore.<br />
<br />
===2002===<br />
<br />
===2001===<br />
* '''SixDegrees.com''', a social network service website that lasted from 1997 to 2001<br />
* '''The Useless Pages''' (at [http://replay.web.archive.org/20000612123540/http://www.go2net.com/useless/index.html IA])<br />
<br />
== Links ==<br />
<br />
=== Other Sites Remember the Dead ===<br />
<br />
* [http://www.disobey.com/ghostsites/ Ghost Sites of the Web] by Steve Baldwin. [http://www.disobey.com/ghostsites/atom.xml RSS Feed]<br />
* [http://www.techcrunch.com/tag/deadpool/ Techcrunch's Deadpool] is an excellent archive of stories about site closings.<br />
* [http://deletionpedia.dbatley.com/w/index.php?title=Main_Page Deletionpedia] saved the articles deleted from Wikipedia in 2008, and [http://wikidumper.blogspot.com/ Wikidumper] preserves a selection of them.<br />
<br />
=== Tragic ===<br />
<br />
* [http://news.cnet.com/8301-13578_3-10029798-38.html "Russia Web site owner killed after arrest" - article at CNET News]<br />
<br />
=== Humorous ===<br />
<br />
* [http://www.nzherald.co.nz/lifestyle/news/article.cfm?c_id=6&objectid=10448650 "Dating website's miscalculated publicity attempt" - article at New Zealand Herald]<br />
<br />
{{Navigation box}}<br />
<br />
[[Category:Archive Team]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=Philosophy&diff=8808Philosophy2012-08-24T12:07:15Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by BlueMaxima</p>
<hr />
<div>== Philosophy of the Archive Team ==<br />
<br />
=== Statement of Philosophy of the Team Archive Wiki ===<br />
<br />
This is a quick set of statements regarding why this Wiki exists, and the intentions behind maintaining it, and general philosophies of the owner. It's intended to prevent misunderstandings in the future, and to enable people to focus on adding to the entries without needlessly inefficient battles and in-fighting. Everything's up for discussion, but know who you should be discussing it with. <br />
<br />
==== There's a Lead Guy ====<br />
<br />
As it is, I perceive myself ([[User:jscott|Jason Scott]]) in the role of "Editor in Chief", that is, the person overseeing the direction and motivations of this Wiki<ref>This means it's my fault.</ref>. While I am perfectly happy with intra-user discussions about procedure and policy and approach, if things become intractable, parties should all feel they can come to me and make the buck stop somewhere.<br />
<br />
The site has no anonymous edits. Come on here and pseudo-nym yourself to your heart's content, but IP-based random hit-and-run leads to a lot of wasted time which we don't have in this time-critical situation. Registering is nearly instantaneous. Do it.<br />
<br />
==== Information First, Then Action ====<br />
<br />
The best situation is raising awareness of options when taking your data somewhere. To that end, we will maintain pages that are not really related to us archiving anything, but giving users information on tools they can use to archive things. There is a lot of information out there and there is a lot of good work being done, and even a lot of good work being done to categorize this information. Archive Team is just going to be yet another checklist to help you if you come calling.<br />
<br />
Similarly, we are likely to end up being a [[deadpool]] of sorts, announcing or tracking sites that are failing, on their last legs, or suddenly gone. <br />
<br />
==== What to Save ====<br />
<br />
Our priority should be sites where user content was solicited and then provided. If we can, we should try to mirror the whole site.<br />
Next should be sites that are beloved collections of material, or which contain seemingly unique information.<br />
After that, anything goes. It's good to have a backup.<br />
<br />
==== Opportunities to make a Difference ====<br />
<br />
Wikis can be pretty dreary looking, so if you have skills in the areas of graphic design, mediawiki layout, or just like to keep track of links, please hop on and be a part of it.<br />
<br />
This document will be improved.<br />
<br />
- [[USER:jscott|Jason Scott]]<br />
<br />
<references/><br />
<br />
[[Category:Archive Team]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Souterrain&diff=8807User:Souterrain2012-08-24T12:07:15Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Souterrain</p>
<hr />
<div> 17:53 < SketchCow> Oh, I see, you're one of those people who are dead inside.<br />
17:53 < SketchCow> If you need me, I'll be over here, getting shit fixed, deady</div>Soulthttps://wiki.archiveteam.org/index.php?title=Software&diff=8806Software2012-08-24T12:07:14Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Nemo bis</p>
<hr />
<div>__NOTOC__<br />
== General Tools ==<br />
<br />
* [[Wget|GNU WGET]]<br />
** Backing up a Wordpress site: "wget --no-parent --no-clobber --html-extension --recursive --convert-links --page-requisites --user=<username> --password=<password> <path>"<br />
* [[Wget with WARC output]]<br />
* [http://curl.haxx.se/ cURL]<br />
* [http://www.httrack.com/ HTTrack] - [[HTTrack options]]<br />
* [http://crawler.archive.org/ Heritrix] -- what archive.org use<br />
* [http://pavuk.sourceforge.net/ Pavuk] -- a bit flaky, but very flexible<br />
* http://warrick.cs.odu.edu/warrick.html<br />
* [http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup] - Python library for web scraping<br />
* [http://scrapy.org/ Scrapy] - Fast python library for web scraping<br />
* [http://splinter.cobrateam.info/ Splinter] - Web app acceptance testing library for Python -- could be used along with a scraping lib to extract data from hard-to-reach places<br />
* [http://sourceforge.net/projects/wilise/ WiLiSe] '''Wi'''ki'''Li'''nk '''Se'''arch - Python script to get links to specific pages of a site through the search in a Wiki ([[wikipedia:MediaWiki|MediaWiki]]-type) has the [http://www.mediawiki.org/wiki/Api.php api.php] accessible or [http://www.mediawiki.org/wiki/Extension:LinkSearch extension LinkSearch] enabled (the project is still very immature and at the moment the code is only available in [http://sourceforge.net/p/wilise/code/1/tree/code/trunk/ this SVN repository]).<br />
<br />
== Hosted tools ==<br />
[http://www.pinboard.in Pinboard] is a convenient social bookmarking service that will [http://pinboard.in/blog/153/ archive copies of all your bookmarks] for online viewing. The catch is that it costs $9.25 just to join, plus $25/year for the archival feature and you can only download archives of your 25 most recent bookmarks in a particular category. This may pose problems if you ever need to get your data out in a hurry.<br />
<br />
== Site-Specific ==<br />
<br />
* [[Google]]<br />
* [[Livejournal]]<br />
* [[Twitter]]<br />
* [http://code.google.com/p/somaseek/ SomaFM]<br />
<br />
== Format Specific ==<br />
<br />
* [http://www.shlock.co.uk/Utils/OmniFlop/OmniFlop.htm OmniFlop]<br />
<br />
[[Category:Tools| ]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=User:Jscott&diff=8805User:Jscott2012-08-24T12:07:11Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Jscott</p>
<hr />
<div>[[Image:Jscott_at_defcon.jpg|500px|center|Photo of Jason at DEFCON 19 by Matt Southworth http://www.flickr.com/photos/ssmcintyre/6027454101/in/photostream]]<br />
<br />
Jason Scott is the official mascot of Archive Team, resident loudmouth, and probably the one who goes to Archive Jail when the inevitable crackdown happens. Until that sad day when he's frogwalked into a waiting van for his one-way trip to Gitmo, he works on various Archive Team projects, occasionally coordinating and often cheerleading. He has been voted Most Likely To Be On a Show About Archiving Screaming About Something for several years running. <br />
<br />
=== RECORDINGS, AUDIO AND VIDEO ===<br />
<br />
* [http://ascii.textfiles.com/archives/3029 The Splendiferous Story of Archive Team] (Audio and Transcription, PDA 11, Internet Archive, SF, CA, US) <br />
* [http://recordkeepingroundtable.org/2011/06/25/where-do-old-websites-go-to-die-with-jason-scott-of-archive-team-podcast/ Where do Old Websites go to Die?] (Recordkeeping Roundtable, Sydney, AU)<br />
* [http://misener.org/archives/748 Archiving Geocities: Full Interview with Jason Scott] (CBC Radio)<br />
* [http://www.publicknowledge.org/blog/pk-know-podcast-10 Archiving Geocities] (Public Knowledge Podcast)<br />
<br />
=== WRITING, ESSAYS AND EVOCATIONS ===<br />
<br />
I've written quite a bit about various archiving, preservation and histrionic rants going in no direction whatsoever. <br />
<br />
* [http://ascii.textfiles.com/archives/798 The Disks of Mister Keegan] about saving Apple II disks.<br />
* [http://ascii.textfiles.com/archives/1237 Love in the Time of Two Terabytes] announces some disks I bought.<br />
* [http://ascii.textfiles.com/archives/1311 Nice Try, Archiver Hater] got me a new special friend.<br />
* [http://ascii.textfiles.com/archives/1357 Scanning Infocom] is about saving your old work stuff.<br />
* [http://ascii.textfiles.com/archives/1406 How It Goes] which has some ideas on sorting.<br />
* [http://ascii.textfiles.com/archives/1407 Escaping the Escapist] where a site ruined itself and I grabbed it.<br />
* [http://ascii.textfiles.com/archives/1468 Pretty and Pathetic] or why massive clumping beats pretty.<br />
* [http://ascii.textfiles.com/archives/1490 The Wall] or how I sort my office for projects.<br />
* [http://ascii.textfiles.com/archives/1558 RADIOSHACKCATALOGS] or If You're Going to Save, Do it Right.<br />
* [http://ascii.textfiles.com/archives/1617 Eviction, or the Coming Datapocalypse] got some attention.<br />
* [http://ascii.textfiles.com/archives/1649 Datapocalypso!] in which I deal with some flak.<br />
* [http://ascii.textfiles.com/archives/1664 Team Archive is Go] announcing this Wiki.<br />
<br />
=== CRITICAL INFRASTRUCTURE AND DOCUMENTS ===<br />
<br />
* [http://www.vimeo.com/603058 Possessed] is a short documentary on hoarding worth checking out.<br />
* After a short time with Jason, people [http://radio.notacon.org/2011/shows/Fuck%20Jason%20Scott.mp3 are never the same.]<br />
* I'm sorry to report this logo has been rejected for Archive Team:<br />
<br />
<br />
[[Image:rejectedatlogo.jpg|center]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=File:IMG_9204.JPG&diff=8804File:IMG 9204.JPG2012-08-24T12:07:08Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Emijrp</p>
<hr />
<div>JASON SCOTT<br />
[[Category:Files]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=File:Archiveteam.jpg&diff=8803File:Archiveteam.jpg2012-08-24T12:06:53Z<p>Soult: Reverted edits by GeorgeHoward (talk) to last revision by Emijrp</p>
<hr />
<div>ARCHIVE TEAM LOGO<br />
(Until something better comes along.)<br />
[[Category:Files]]</div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist&diff=8701MediaWiki:Spam-blacklist2012-08-06T16:37:26Z<p>Soult: </p>
<hr />
<div> # External URLs matching this list will be blocked when added to a page.<br />
# This list affects only this wiki; refer also to the global blacklist.<br />
# For documentation see http://www.mediawiki.org/wiki/Extension:SpamBlacklist<br />
#<!-- leave this line exactly as it is --> <pre><br />
#<br />
# Syntax is as follows:<br />
# * Everything from a "#" character to the end of the line is a comment<br />
# * Every non-blank line is a regex fragment which will only match hosts inside URLs<br />
# * ^.* and .*$ make it so that only domains are matched, not full URLs<br />
<br />
# Spam terms<br />
best-?deal<br />
attsystems<br />
(car|health|life)-?insurance<br />
christian-?louboutin<br />
electronic-?cigarette<br />
hotels?-?(booking|discount)<br />
jersey<br />
jordan<br />
lingerie<br />
loan<br />
lottery<br />
money<br />
outlet<br />
penis-?enlargement<br />
sex-?chat<br />
weight-?(loss|gain)<br />
<br />
# Blogging/community sites which are bad at filtering spam<br />
aolanswers\.com<br />
\.beeplog\.com<br />
\.blog\.ca<br />
blogspace\.fr<br />
diigo\.com<br />
doomby\.com<br />
foodbuzz\.com<br />
gameinformer\.com<br />
insanejournal\.com<br />
jeteye\.com<br />
livelogcity\.com<br />
\.posterous\.com<br />
retrogamer\.net<br />
sitesays\.com<br />
statigr\.am<br />
tagged\.com<br />
\.tumblr\.com # Requires account to report spam blogs. No thank you.<br />
\.xanga\.com<br />
<br />
# Hacked websites where the abuse departement does not care<br />
asu\.edu<br />
ncsu\.edu<br />
scu\.edu # Has no abuse contact at all<br />
<br />
# Spam Domains<br />
123finances\.eu<br />
1remodelingchicago\.com<br />
247digitallearning\.com<br />
3825\.co.uk<br />
5x5workouts\.net<br />
ableton-serato\.com<br />
abubbleshooter\.info<br />
accountingdegree101.com<br />
adidasjeremyscottwings\.com<br />
adral\.eu<br />
air-conditioner-reviews\.info<br />
akerpub\.com<br />
alonerank\.com<br />
ameritrustshield\.com<br />
angelweddingdress\.com<br />
antivirusfirewallsoftwaresite\.org<br />
aremyhair\.com<br />
asiaone\.com<br />
askmehelpdesk\.com<br />
australias\.com\.au<br />
autostoreplus\.com<br />
babygearland\.com<br />
backup-4\.com<br />
beats-bydre\.net<br />
beijingsensualmassage\.com<br />
bestbuylouisvuitton\.com<br />
bestdatinglink\.com<br />
bestonlinebuys\.net<br />
bestseoagency\.net<br />
bestwebsitedesigncompanies\.net<br />
bioactives-morinda\.com<br />
bizeso\.com<br />
bizscribes\.com<br />
bodyjewelleryshop\.com<br />
bubbleshooteronline\.info<br />
buybeatsbydre\.com<br />
buycarhartt\.net<br />
cabbagesoupdiett\.com<br />
calculette-pret-immobilier\.fr<br />
casinoinfo\.pl<br />
cedarmulch\.net<br />
classvogue\.com<br />
clickbank\.net<br />
colorstrokespainters\.com\.au<br />
conservatoryprices0ob\.net<br />
creepers\.ch<br />
czarymary\.pl<br />
debtsadvice\.net<br />
demo-download\.org<br />
dentists-atlanta\.net<br />
doubleglazeddoorslj9j\.org<br />
download-yahoo-messenger\.net<br />
dress-sense\.sg<br />
dressup24h\.com<br />
easypret\.fr<br />
ebutcherblockcountertops\.com<br />
efusiontech\.com<br />
emergencypreparednesshelp\.net<br />
empowerbpo\.com<br />
e-najlepsza-lokata\.com<br />
everylight\.co\.uk<br />
eyelashcurlers\.org<br />
ezdi\.us<br />
fabricacurtain\.com<br />
felicitysglutenfreehandbook\.com<br />
femmefatalehats\.com<br />
finanziellen-freiraum\.de<br />
findingnola\.com<br />
for-htc\.com<br />
free-drug-rehab\.com<br />
furnitureforbathroom\.co\.uk<br />
gadgetinthebox\.com<br />
garminnuviportablegps\.com<br />
gazeta\.pl<br />
geek-lamp\.com<br />
get-free-diapers\.com<br />
getgplusvotes\.com<br />
glutenfreerecipebox\.com<br />
goodreads\.com<br />
googlelocalranking\.com<br />
gpgyjr\.com\.cn<br />
guanacaste\.net<br />
hairagainreviews\.org<br />
hatena\.com<br />
hdrolx\.com<br />
hermesfair\.com<br />
hervelegerdressonline2012\.com<br />
hi5mediagroup\.com<br />
hostingreviewsandcoupons\.com<br />
house-maintain\.blogspot\.com<br />
i-am-adopted\.com<br />
iii\.org\.tw<br />
iinfobase.\com<br />
inboxbuddy\.com<br />
instantcashloanforme\.com<br />
ipadaccessoriesale\.uk\.com<br />
itsakon\.com<br />
iusetbellum\.blogspot\.com<br />
jasa-seo\.org<br />
juegosgratis1\.info<br />
jukeboxalive\.com<br />
jumbobookmarks\.com<br />
keepandshare\.com<br />
kinderbadeshop\.de<br />
kingswaylifecare\.com<br />
kokosowy\.pl<br />
lagbook\.com<br />
lawn-edging\.com<br />
legalsoundz\.com<br />
leonisawesome\.com<br />
letusreckon\.com<br />
lolliboys\.com<br />
louisvuittonbagsroom\.com<br />
louisvuittonhandbagsstore\.co\.uk<br />
macipadvideo\.us\.com<br />
macobserver\.com<br />
marketingmassachusetts\.net<br />
marvelousessays\.com<br />
masscouponsubmitter\.com<br />
matelasbonheur\.ca<br />
mbk-center\.com<br />
mebletarnów\.com.pl<br />
mediscribes\.com<br />
meridianstars\.com<br />
merritts\.uk\.com<br />
michaeljackson-songs\.org<br />
miedziaki\.eu<br />
mmichalekellya\.xanga\.com<br />
moslemunity\.com<br />
mp3in1\.com<br />
mulchinglawn\.com<br />
my-landimmo\.de<br />
n2acards\.com<br />
naturalinfertilitytreatments\.wordpress.com<br />
needrapidcashnow\.com<br />
net-promotion\.pl<br />
newerahats2012\.com<br />
new-rap-songs\.com<br />
newyorkgiantsnikejerseysstore\.com<br />
nibtv\.com<br />
nieruchomościtarnów\.com.pl<br />
onestopbookmarks\.com<br />
onlineprnews\.com<br />
ovidiusilaghi\.ro<br />
pandatarot\.com<br />
patiodoorsda5\.com<br />
pisaniecv\.info<br />
pixelperfectsoftworks\.com<br />
popculturedivas\.com<br />
power-leveling\.us<br />
practutor\.com<br />
prnewswire\.com<br />
product-samples\.net<br />
professays\.com<br />
prosdi\.com<br />
purevolume\.com<br />
qiel\.com<br />
r220\.cc<br />
rajpromotions\.com<br />
rapidfbfans\.com<br />
recoverytoolbox\.com<br />
redditmarketing\.com<br />
repairtoolbox\.com<br />
retiringincostarica\.org<br />
rulettstrategiak\.com<br />
sarasotacriminalattorneys\.com<br />
sbwire\.com<br />
scheidungohneanwalt\.com<br />
schoolgrantsguides\.blogspot.com<br />
scrapebrokers\.com<br />
self-defensesupply\.com<br />
seogooglemaps\.net<br />
seo-methods\.com<br />
seopackagepricing\.com<br />
showingoncam\.com<br />
smartphonewebcreator\.com<br />
sovcal\.com<br />
spittingandvomitting\.com<br />
sports-camping\.com<br />
squidoo\.com<br />
superiorpapers\.com<br />
supremeessays\.com<br />
surfbrands\.net<br />
symbian-kreatif\.co\.cc<br />
szybki-kredyt-bez-bik\.com<br />
tabletpcwarehouse\.net<br />
tani-kredyt-mieszkaniowy\.org<br />
taoholycity\.com<br />
televisionspain\.net<br />
theprivatenetwork\.net<br />
thesecretworldhack\.com<br />
thespainforum\.com<br />
theuniquehoodiasite\.com<br />
ticketforeverything\.com<br />
tjindustrial\.com<br />
tn-?requin-?paschers\.(biz|eu)<br />
trustbanq\.com<br />
urbanspycam\.com<br />
usalouisvuittonshopping\.com<br />
vacationscostarica\.com<br />
wallmountingatv\.com<br />
watchnflgamesonlinehd\.com<br />
webdesignsanluisobispo\.wordpress.com<br />
webousb\.com<br />
wedding-cake-decorations\.net<br />
wedding-cake-stands\.net<br />
wheretogetengaged\.com<br />
wholesaledefenseonline\.com<br />
whyimhotter\.com<br />
whyonlinebackup\.com<br />
wordpressseoexpert\.com<br />
worldselectshop\.com<br />
wylinka\.com<br />
xaby\.com<br />
xg4ken\.com<br />
yaseminler\.com<br />
zay\.pl<br />
ziel-motivation\.com<br />
zlewozmywak24\.pl<br />
<br />
#</pre> <!-- leave this line exactly as it is --></div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist&diff=8693MediaWiki:Spam-blacklist2012-08-05T13:51:05Z<p>Soult: </p>
<hr />
<div> # External URLs matching this list will be blocked when added to a page.<br />
# This list affects only this wiki; refer also to the global blacklist.<br />
# For documentation see http://www.mediawiki.org/wiki/Extension:SpamBlacklist<br />
#<!-- leave this line exactly as it is --> <pre><br />
#<br />
# Syntax is as follows:<br />
# * Everything from a "#" character to the end of the line is a comment<br />
# * Every non-blank line is a regex fragment which will only match hosts inside URLs<br />
# * ^.* and .*$ make it so that only domains are matched, not full URLs<br />
<br />
# Spam terms<br />
best-?deal<br />
attsystems<br />
(car|health|life)-?insurance<br />
christian-?louboutin<br />
electronic-?cigarette<br />
hotels?-?(booking|discount)<br />
jersey<br />
jordan<br />
lingerie<br />
loan<br />
lottery<br />
money<br />
outlet<br />
penis-?enlargement<br />
sex-?chat<br />
weight-?(loss|gain)<br />
<br />
# Blogging/community sites which are bad at filtering spam<br />
aolanswers\.com<br />
\.beeplog\.com<br />
\.blog\.ca<br />
blogspace\.fr<br />
doomby\.com<br />
foodbuzz\.com<br />
gameinformer\.com<br />
insanejournal\.com<br />
jeteye\.com<br />
livelogcity\.com<br />
\.posterous\.com<br />
retrogamer\.net<br />
sitesays\.com<br />
statigr\.am<br />
tagged\.com<br />
\.tumblr\.com # Requires account to report spam blogs. No thank you.<br />
\.xanga\.com<br />
<br />
# Hacked websites where the abuse departement does not care<br />
asu\.edu<br />
ncsu\.edu<br />
scu\.edu # Has no abuse contact at all<br />
<br />
# Spam Domains<br />
123finances\.eu<br />
1remodelingchicago\.com<br />
247digitallearning\.com<br />
3825\.co.uk<br />
5x5workouts\.net<br />
ableton-serato\.com<br />
abubbleshooter\.info<br />
accountingdegree101.com<br />
adidasjeremyscottwings\.com<br />
adral\.eu<br />
air-conditioner-reviews\.info<br />
akerpub\.com<br />
alonerank\.com<br />
ameritrustshield\.com<br />
angelweddingdress\.com<br />
antivirusfirewallsoftwaresite\.org<br />
aremyhair\.com<br />
asiaone\.com<br />
askmehelpdesk\.com<br />
australias\.com\.au<br />
autostoreplus\.com<br />
babygearland\.com<br />
backup-4\.com<br />
beats-bydre\.net<br />
beijingsensualmassage\.com<br />
bestbuylouisvuitton\.com<br />
bestdatinglink\.com<br />
bestonlinebuys\.net<br />
bestseoagency\.net<br />
bestwebsitedesigncompanies\.net<br />
bioactives-morinda\.com<br />
bizeso\.com<br />
bizscribes\.com<br />
bodyjewelleryshop\.com<br />
bubbleshooteronline\.info<br />
buybeatsbydre\.com<br />
buycarhartt\.net<br />
cabbagesoupdiett\.com<br />
calculette-pret-immobilier\.fr<br />
casinoinfo\.pl<br />
cedarmulch\.net<br />
classvogue\.com<br />
clickbank\.net<br />
colorstrokespainters\.com\.au<br />
conservatoryprices0ob\.net<br />
creepers\.ch<br />
czarymary\.pl<br />
debtsadvice\.net<br />
demo-download\.org<br />
dentists-atlanta\.net<br />
doubleglazeddoorslj9j\.org<br />
download-yahoo-messenger\.net<br />
dress-sense\.sg<br />
dressup24h\.com<br />
easypret\.fr<br />
ebutcherblockcountertops\.com<br />
efusiontech\.com<br />
emergencypreparednesshelp\.net<br />
empowerbpo\.com<br />
e-najlepsza-lokata\.com<br />
everylight\.co\.uk<br />
eyelashcurlers\.org<br />
ezdi\.us<br />
fabricacurtain\.com<br />
felicitysglutenfreehandbook\.com<br />
femmefatalehats\.com<br />
finanziellen-freiraum\.de<br />
findingnola\.com<br />
for-htc\.com<br />
free-drug-rehab\.com<br />
furnitureforbathroom\.co\.uk<br />
garminnuviportablegps\.com<br />
gazeta\.pl<br />
geek-lamp\.com<br />
get-free-diapers\.com<br />
getgplusvotes\.com<br />
glutenfreerecipebox\.com<br />
goodreads\.com<br />
googlelocalranking\.com<br />
gpgyjr\.com\.cn<br />
guanacaste\.net<br />
hairagainreviews\.org<br />
hatena\.com<br />
hdrolx\.com<br />
hermesfair\.com<br />
hervelegerdressonline2012\.com<br />
hi5mediagroup\.com<br />
hostingreviewsandcoupons\.com<br />
house-maintain\.blogspot\.com<br />
i-am-adopted\.com<br />
iii\.org\.tw<br />
iinfobase.\com<br />
inboxbuddy\.com<br />
instantcashloanforme\.com<br />
itsakon\.com<br />
iusetbellum\.blogspot\.com<br />
jasa-seo\.org<br />
juegosgratis1\.info<br />
jukeboxalive\.com<br />
jumbobookmarks\.com<br />
keepandshare\.com<br />
kinderbadeshop\.de<br />
kingswaylifecare\.com<br />
kokosowy\.pl<br />
lagbook\.com<br />
lawn-edging\.com<br />
legalsoundz\.com<br />
leonisawesome\.com<br />
letusreckon\.com<br />
lolliboys\.com<br />
louisvuittonbagsroom\.com<br />
louisvuittonhandbagsstore\.co\.uk<br />
macobserver\.com<br />
marketingmassachusetts\.net<br />
marvelousessays\.com<br />
masscouponsubmitter\.com<br />
matelasbonheur\.ca<br />
mbk-center\.com<br />
mebletarnów\.com.pl<br />
mediscribes\.com<br />
meridianstars\.com<br />
merritts\.uk\.com<br />
michaeljackson-songs\.org<br />
miedziaki\.eu<br />
mmichalekellya\.xanga\.com<br />
moslemunity\.com<br />
mp3in1\.com<br />
mulchinglawn\.com<br />
my-landimmo\.de<br />
n2acards\.com<br />
naturalinfertilitytreatments\.wordpress.com<br />
needrapidcashnow\.com<br />
net-promotion\.pl<br />
newerahats2012\.com<br />
new-rap-songs\.com<br />
newyorkgiantsnikejerseysstore\.com<br />
nibtv\.com<br />
nieruchomościtarnów\.com.pl<br />
onestopbookmarks\.com<br />
onlineprnews\.com<br />
ovidiusilaghi\.ro<br />
pandatarot\.com<br />
patiodoorsda5\.com<br />
pisaniecv\.info<br />
pixelperfectsoftworks\.com<br />
popculturedivas\.com<br />
power-leveling\.us<br />
practutor\.com<br />
prnewswire\.com<br />
product-samples\.net<br />
professays\.com<br />
prosdi\.com<br />
purevolume\.com<br />
qiel\.com<br />
r220\.cc<br />
rajpromotions\.com<br />
rapidfbfans\.com<br />
recoverytoolbox\.com<br />
redditmarketing\.com<br />
repairtoolbox\.com<br />
retiringincostarica\.org<br />
rulettstrategiak\.com<br />
sarasotacriminalattorneys\.com<br />
sbwire\.com<br />
scheidungohneanwalt\.com<br />
schoolgrantsguides\.blogspot.com<br />
scrapebrokers\.com<br />
self-defensesupply\.com<br />
seogooglemaps\.net<br />
seo-methods\.com<br />
seopackagepricing\.com<br />
showingoncam\.com<br />
smartphonewebcreator\.com<br />
sovcal\.com<br />
spittingandvomitting\.com<br />
sports-camping\.com<br />
squidoo\.com<br />
superiorpapers\.com<br />
supremeessays\.com<br />
surfbrands\.net<br />
symbian-kreatif\.co\.cc<br />
szybki-kredyt-bez-bik\.com<br />
tabletpcwarehouse\.net<br />
tani-kredyt-mieszkaniowy\.org<br />
taoholycity\.com<br />
televisionspain\.net<br />
theprivatenetwork\.net<br />
thesecretworldhack\.com<br />
thespainforum\.com<br />
theuniquehoodiasite\.com<br />
ticketforeverything\.com<br />
tjindustrial\.com<br />
tn-?requin-?paschers\.(biz|eu)<br />
trustbanq\.com<br />
urbanspycam\.com<br />
usalouisvuittonshopping\.com<br />
vacationscostarica\.com<br />
wallmountingatv\.com<br />
watchnflgamesonlinehd\.com<br />
webdesignsanluisobispo\.wordpress.com<br />
webousb\.com<br />
wedding-cake-decorations\.net<br />
wedding-cake-stands\.net<br />
wheretogetengaged\.com<br />
wholesaledefenseonline\.com<br />
whyimhotter\.com<br />
whyonlinebackup\.com<br />
wordpressseoexpert\.com<br />
worldselectshop\.com<br />
wylinka\.com<br />
xaby\.com<br />
xg4ken\.com<br />
yaseminler\.com<br />
zay\.pl<br />
ziel-motivation\.com<br />
zlewozmywak24\.pl<br />
<br />
#</pre> <!-- leave this line exactly as it is --></div>Soulthttps://wiki.archiveteam.org/index.php?title=MediaWiki:Spam-blacklist&diff=8689MediaWiki:Spam-blacklist2012-08-04T20:10:44Z<p>Soult: </p>
<hr />
<div> # External URLs matching this list will be blocked when added to a page.<br />
# This list affects only this wiki; refer also to the global blacklist.<br />
# For documentation see http://www.mediawiki.org/wiki/Extension:SpamBlacklist<br />
#<!-- leave this line exactly as it is --> <pre><br />
#<br />
# Syntax is as follows:<br />
# * Everything from a "#" character to the end of the line is a comment<br />
# * Every non-blank line is a regex fragment which will only match hosts inside URLs<br />
# * ^.* and .*$ make it so that only domains are matched, not full URLs<br />
<br />
# Spam terms<br />
best-?deal<br />
attsystems<br />
(car|health|life)-?insurance<br />
christian-?louboutin<br />
electronic-?cigarette<br />
hotels?-?(booking|discount)<br />
jersey<br />
jordan<br />
lingerie<br />
loan<br />
lottery<br />
money<br />
outlet<br />
penis-?enlargement<br />
sex-?chat<br />
weight-?(loss|gain)<br />
<br />
# Blogging/community sites which are bad at filtering spam<br />
aolanswers\.com<br />
\.beeplog\.com<br />
\.blog\.ca<br />
blogspace\.fr<br />
doomby\.com<br />
foodbuzz\.com<br />
gameinformer\.com<br />
insanejournal\.com<br />
jeteye\.com<br />
livelogcity\.com<br />
\.posterous\.com<br />
retrogamer\.net<br />
sitesays\.com<br />
statigr\.am<br />
tagged\.com<br />
\.tumblr\.com # Requires account to report spam blogs. No thank you.<br />
\.xanga\.com<br />
<br />
# Hacked websites where the abuse departement does not care<br />
asu\.edu<br />
ncsu\.edu<br />
scu\.edu # Has no abuse contact at all<br />
<br />
# Spam Domains<br />
123finances\.eu<br />
1remodelingchicago\.com<br />
247digitallearning\.com<br />
3825\.co.uk<br />
5x5workouts\.net<br />
ableton-serato\.com<br />
abubbleshooter\.info<br />
accountingdegree101.com<br />
adidasjeremyscottwings\.com<br />
adral\.eu<br />
air-conditioner-reviews\.info<br />
akerpub\.com<br />
alonerank\.com<br />
ameritrustshield\.com<br />
angelweddingdress\.com<br />
antivirusfirewallsoftwaresite\.org<br />
aremyhair\.com<br />
asiaone\.com<br />
askmehelpdesk\.com<br />
australias\.com\.au<br />
autostoreplus\.com<br />
babygearland\.com<br />
backup-4\.com<br />
beats-bydre\.net<br />
beijingsensualmassage\.com<br />
bestbuylouisvuitton\.com<br />
bestdatinglink\.com<br />
bestonlinebuys\.net<br />
bestseoagency\.net<br />
bestwebsitedesigncompanies\.net<br />
bioactives-morinda\.com<br />
bizeso\.com<br />
bizscribes\.com<br />
bodyjewelleryshop\.com<br />
bubbleshooteronline\.info<br />
buybeatsbydre\.com<br />
buycarhartt\.net<br />
cabbagesoupdiett\.com<br />
calculette-pret-immobilier\.fr<br />
casinoinfo\.pl<br />
cedarmulch\.net<br />
classvogue\.com<br />
clickbank\.net<br />
colorstrokespainters\.com\.au<br />
conservatoryprices0ob\.net<br />
creepers\.ch<br />
czarymary\.pl<br />
debtsadvice\.net<br />
demo-download\.org<br />
dentists-atlanta\.net<br />
doubleglazeddoorslj9j\.org<br />
download-yahoo-messenger\.net<br />
dress-sense\.sg<br />
dressup24h\.com<br />
easypret\.fr<br />
ebutcherblockcountertops\.com<br />
efusiontech\.com<br />
emergencypreparednesshelp\.net<br />
empowerbpo\.com<br />
e-najlepsza-lokata\.com<br />
everylight\.co\.uk<br />
eyelashcurlers\.org<br />
ezdi\.us<br />
fabricacurtain\.com<br />
felicitysglutenfreehandbook\.com<br />
femmefatalehats\.com<br />
finanziellen-freiraum\.de<br />
findingnola\.com<br />
for-htc\.com<br />
free-drug-rehab\.com<br />
furnitureforbathroom\.co\.uk<br />
garminnuviportablegps\.com<br />
gazeta\.pl<br />
geek-lamp\.com<br />
get-free-diapers\.com<br />
getgplusvotes\.com<br />
glutenfreerecipebox\.com<br />
goodreads\.com<br />
googlelocalranking\.com<br />
gpgyjr\.com\.cn<br />
guanacaste\.net<br />
hairagainreviews\.org<br />
hatena\.com<br />
hdrolx\.com<br />
hermesfair\.com<br />
hervelegerdressonline2012\.com<br />
hi5mediagroup\.com<br />
hostingreviewsandcoupons\.com<br />
house-maintain\.blogspot\.com<br />
i-am-adopted\.com<br />
iii\.org\.tw<br />
iinfobase.\com<br />
inboxbuddy\.com<br />
instantcashloanforme\.com<br />
itsakon\.com<br />
iusetbellum\.blogspot\.com<br />
jasa-seo\.org<br />
juegosgratis1\.info<br />
jukeboxalive\.com<br />
jumbobookmarks\.com<br />
keepandshare\.com<br />
kingswaylifecare\.com<br />
kokosowy\.pl<br />
lagbook\.com<br />
lawn-edging\.com<br />
legalsoundz\.com<br />
leonisawesome\.com<br />
letusreckon\.com<br />
lolliboys\.com<br />
louisvuittonbagsroom\.com<br />
louisvuittonhandbagsstore\.co\.uk<br />
macobserver\.com<br />
marketingmassachusetts\.net<br />
marvelousessays\.com<br />
masscouponsubmitter\.com<br />
matelasbonheur\.ca<br />
mbk-center\.com<br />
mebletarnów\.com.pl<br />
mediscribes\.com<br />
meridianstars\.com<br />
merritts\.uk\.com<br />
michaeljackson-songs\.org<br />
miedziaki\.eu<br />
mmichalekellya\.xanga\.com<br />
moslemunity\.com<br />
mp3in1\.com<br />
mulchinglawn\.com<br />
my-landimmo\.de<br />
n2acards\.com<br />
naturalinfertilitytreatments\.wordpress.com<br />
needrapidcashnow\.com<br />
net-promotion\.pl<br />
newerahats2012\.com<br />
new-rap-songs\.com<br />
newyorkgiantsnikejerseysstore\.com<br />
nibtv\.com<br />
nieruchomościtarnów\.com.pl<br />
onestopbookmarks\.com<br />
onlineprnews\.com<br />
ovidiusilaghi\.ro<br />
pandatarot\.com<br />
patiodoorsda5\.com<br />
pisaniecv\.info<br />
pixelperfectsoftworks\.com<br />
popculturedivas\.com<br />
power-leveling\.us<br />
practutor\.com<br />
prnewswire\.com<br />
product-samples\.net<br />
professays\.com<br />
prosdi\.com<br />
purevolume\.com<br />
qiel\.com<br />
r220\.cc<br />
rajpromotions\.com<br />
rapidfbfans\.com<br />
recoverytoolbox\.com<br />
redditmarketing\.com<br />
repairtoolbox\.com<br />
retiringincostarica\.org<br />
rulettstrategiak\.com<br />
sarasotacriminalattorneys\.com<br />
sbwire\.com<br />
scheidungohneanwalt\.com<br />
schoolgrantsguides\.blogspot.com<br />
scrapebrokers\.com<br />
self-defensesupply\.com<br />
seogooglemaps\.net<br />
seo-methods\.com<br />
seopackagepricing\.com<br />
showingoncam\.com<br />
smartphonewebcreator\.com<br />
sovcal\.com<br />
spittingandvomitting\.com<br />
sports-camping\.com<br />
squidoo\.com<br />
superiorpapers\.com<br />
supremeessays\.com<br />
surfbrands\.net<br />
symbian-kreatif\.co\.cc<br />
szybki-kredyt-bez-bik\.com<br />
tabletpcwarehouse\.net<br />
tani-kredyt-mieszkaniowy\.org<br />
taoholycity\.com<br />
televisionspain\.net<br />
theprivatenetwork\.net<br />
thesecretworldhack\.com<br />
thespainforum\.com<br />
theuniquehoodiasite\.com<br />
ticketforeverything\.com<br />
tjindustrial\.com<br />
tn-?requin-?paschers\.(biz|eu)<br />
trustbanq\.com<br />
urbanspycam\.com<br />
usalouisvuittonshopping\.com<br />
vacationscostarica\.com<br />
wallmountingatv\.com<br />
watchnflgamesonlinehd\.com<br />
webdesignsanluisobispo\.wordpress.com<br />
webousb\.com<br />
wedding-cake-decorations\.net<br />
wedding-cake-stands\.net<br />
wheretogetengaged\.com<br />
wholesaledefenseonline\.com<br />
whyimhotter\.com<br />
whyonlinebackup\.com<br />
wordpressseoexpert\.com<br />
worldselectshop\.com<br />
wylinka\.com<br />
xaby\.com<br />
xg4ken\.com<br />
yaseminler\.com<br />
zay\.pl<br />
ziel-motivation\.com<br />
zlewozmywak24\.pl<br />
<br />
#</pre> <!-- leave this line exactly as it is --></div>Soult