Difference between revisions of "URLTeam"
m (Use optipng-compressed logo) |
(→New table: Update table) |
||
(2 intermediate revisions by the same user not shown) | |||
Line 49: | Line 49: | ||
! Comments | ! Comments | ||
|- | |- | ||
| [http://tinyurl.com | | [http://tinyurl.com/ Tinyurl.com] | ||
| | | 1,000,000,000 | ||
| [[ | | [[Warrior]] | ||
| | | scraping: sequential, <= 6 characters | ||
| non-sequential, | | new shorturls: non-sequential, 7 characters | ||
|- | |- | ||
| [http://bit.ly | | [http://bit.ly/ Bit.ly] | ||
| | | 4,000,000,000 | ||
| [[ | | [[Warrior]] | ||
| | | scraping: non-sequential, 6 characters | ||
| non-sequential | | new shorturls: non-sequential, 6 characters | ||
|- | |- | ||
| [http://goo.gl | | [http://goo.gl Goo.gl] | ||
| | | ? | ||
| [[User:Scumola]] | | [[User:Scumola]] | ||
| started (2011-03-04) | | started (2011-03-04) | ||
Line 68: | Line 68: | ||
|- | |- | ||
| [http://is.gd is.gd] | | [http://is.gd is.gd] | ||
| | | 810,264,745 (2013-01-30) | ||
| [[ | | [[Warrior]] | ||
| | | scraping: sequential, <= 5 characters | ||
| | | new shorturls: non-sequential, 6 characters | ||
|- | |- | ||
| [http://ff.im ff.im] | | [http://ff.im ff.im] | ||
Line 82: | Line 82: | ||
| 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref> | | 1279 (2009-08-14)<ref>http://github.com/chronomex/urlteam</ref> | ||
| [[User:Chronomex]] | | [[User:Chronomex]] | ||
| | | | ||
| dead (2011-02-15) | | dead (2011-02-15) | ||
|- | |- | ||
| litturl.com | | litturl.com | ||
| 17096<ref>http://github.com/chronomex/urlteam</ref> | | 17096 (2010-04-15)<ref>http://github.com/chronomex/urlteam</ref> | ||
| [[User:Chronomex]] | | [[User:Chronomex]] | ||
| | | | ||
| dead (2010-11-18) | | dead (2010-11-18) | ||
|- | |- | ||
Line 140: | Line 140: | ||
|} | |} | ||
=== | === Alive === | ||
Last verified 2012-12-29. Original list last updated 2009-08-14 <ref>http://blog.go2.me/2009/01/exhausting-review-of-link-shorteners.html</ref>. | |||
* awe.sm | * awe.sm | ||
* budurl.com - Appears non-incremental | * budurl.com - Appears non-incremental | ||
Line 152: | Line 152: | ||
* easyurl.net - Appears non-incremental: http://easyurl.net/afd2f | * easyurl.net - Appears non-incremental: http://easyurl.net/afd2f | ||
* ilix.in - HTML redirect | * ilix.in - HTML redirect | ||
* jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388 | * jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388 | ||
* metamark.net / xrl.us - ? http://xrl.us/bfabog | * metamark.net / xrl.us - ? http://xrl.us/bfabog | ||
* myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect | * myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect | ||
* notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/ | * notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/ | ||
* nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination. | * nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination. | ||
* pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc | * pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc | ||
* redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok | * redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok | ||
* | * sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX | ||
* | * shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu | ||
* shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok | * shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok | ||
* shrinkurl.us - Alway telling URL is malformed | * shrinkurl.us - Alway telling URL is malformed | ||
* shrt.st - Appears incremental: http://shrt.st/vpz | * shrt.st - Appears incremental: http://shrt.st/vpz | ||
* simurl.com - Doesn't appear guessable: http://simurl.com/panpes | * simurl.com - Doesn't appear guessable: http://simurl.com/panpes | ||
* smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect. | * smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect. | ||
* snipr.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt | * snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt | ||
* surl.co.uk - Many shortening options. | * surl.co.uk - Many shortening options. | ||
* tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv | * tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv | ||
* tiny.cc - Appears non-incremental | * tiny.cc - Appears non-incremental | ||
* tweetburner.com / twurl.nl - Appears incremental | * tweetburner.com / twurl.nl - Appears incremental | ||
* twitthis.com | * twitthis.com | ||
* u.mavrev.com - Not accepting new urls. | |||
* u.mavrev.com | |||
* ur1.ca - Database is downloadable from website directly. | * ur1.ca - Database is downloadable from website directly. | ||
* urlcut.com | * urlcut.com | ||
* vimeo.com | * vimeo.com | ||
* | * vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. | ||
* xrl.us - see metamark.net | * xrl.us - see metamark.net | ||
* | * x.se - Cannot resolve, but www.x.se works. | ||
* yatuc.com - Not accepting new urls. | |||
* yatuc.com | |||
* yep.it | * yep.it | ||
==== "Official" shorteners ==== | ==== "Official" shorteners ==== | ||
Line 231: | Line 202: | ||
* tcrn.ch - Techcrunch | * tcrn.ch - Techcrunch | ||
=== Dead or Broken === | |||
* 1link.in - Website dead | |||
* 6url.com - HTML redirect, Error 500 | |||
* ad.vu - mirror of adjix.com, application not found | |||
* canurl.com - Website dead | |||
* chod.sk - Appears non-incremental, not resolving | * chod.sk - Appears non-incremental, not resolving | ||
* digg.com - discontinued - [http://about.digg.com/blog/update-diggs-short-url-service] | |||
* dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041 | |||
* easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3 | |||
* go2cut.com - Website dead | |||
* gonext.org - not resolving | * gonext.org - not resolving | ||
* imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve. | |||
* ix.it - Not resolving | * ix.it - Not resolving | ||
* jijr.com - Doesn't appear to be a shortener, now parked | * jijr.com - Doesn't appear to be a shortener, now parked | ||
* kissa.be - "Kissa.be url shortener service is shutdown" | * kissa.be - "Kissa.be url shortener service is shutdown" | ||
* kurl.us - Parked. | * kurl.us - Parked. | ||
* lnkurl.com - Website dead | |||
* memurl.com - Pronounceable. Broken. | |||
* miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..." | * miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..." | ||
* minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead | |||
* minurl.org - Presently in ERROR 404 | * minurl.org - Presently in ERROR 404 | ||
* muhlink.com - Not resolving | * muhlink.com - Not resolving | ||
* myurl.us - cpanel frontend | * myurl.us - cpanel frontend | ||
* nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters | * nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters | ||
* | * qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked. | ||
* s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked. | |||
* shortlinks.co.uk - Working again. Maybe not. | |||
* short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp | |||
* shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked. | |||
* traceurl.com - DNS fails to resolve. | |||
* tr.im - "Be back soon!" | |||
* twitpwr.com - Domain parked. | |||
* u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/) | * u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/) | ||
* url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly." | |||
* urlborg.com - 404 Not Found. | |||
* urlcover.com - Domain parked. | |||
* urlhawk.com - Domain parked. | |||
* url-press.com - Suspended by web host. | |||
* urlsmash.com - DNS not resolving. | |||
* urltea.com - Dreamhost's coming soon page. | |||
* urlvi.be - Domain parked. | |||
* urlx.org - Owner has agreed to share his database | |||
* w3t.org - 403 Forbidden. | |||
* wlink.us - Domain parked. | |||
* xaddr.com - Domain parked. | |||
* xil.in - Under construction. | |||
* xym.kr - Gibberish (?) Korean text blog. | |||
* yweb.com - Suspicious iframe with long url and fake loading gif image. | |||
* zi.ma - DNS not resolving. | |||
==== Discontinued ==== | |||
* urlbrief.com - co-operates with 301Works.org | |||
=== Hueg list === | |||
[http://code.google.com/p/shortenurl/wiki/URLShorteningServices] | [http://code.google.com/p/shortenurl/wiki/URLShorteningServices] | ||
Revision as of 21:54, 30 January 2013
Urlteam | |
url shortening was a fucking awful idea | |
URL | http://urlte.am |
Status | Online! |
Archiving status | In progress... |
Archiving type | Unknown |
Project source | https://github.com/ArchiveTeam/urlteam-stuff |
Project tracker | http://tracker.tinyarchive.org/ |
IRC channel | #urlteam (on hackint) |
TinyURL, bit.ly and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see Wikipedia: Link Rot). Archive.org/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member bit.ly does not actually share their databases and most other big shorteners don't share theirs either.
Who did this?
You can join us in our IRC channel: #urlteam on EFNet
- User:Scumola started this wiki page
- User:Chronomex started the Urlteam scraping effort
- User:Soult Helps with scraping
- User:Jeroenz0r Helps with scraping (and stalking Soult)
- ... many ArchiveTeam people who run the scrapers
301Work cooperation
The fine folks at archive.org have provides us with upload permissions to the 301Works archive: http://www.archive.org/details/301utm. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use tab-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).
Tools
- fetcher.pl: Perl-based scraper by User:Chronomex
- TinyBack: Python 2.x-based, distributed scraper (also works with the Warrior)
TinyBack
The easiest way to help with scraping is to run the Warrior and select the URLTeam project. You can also run TinyBack outside the warrior, thought Python 2.6 or newer is required:
git clone https://github.com/soult/tinyback cd tinyback # Use ./run.py --help for more information on command-line options ./run.py --tracker=http://tracker.tinyarchive.org/v1/ --num-threads=3 --sleep=180
URL shorteners
New table
The new table includes shorteners we have already started to scrape.
Name | Est. number of shorturls | Scraping done by | Status | Comments |
---|---|---|---|---|
Tinyurl.com | 1,000,000,000 | Warrior | scraping: sequential, <= 6 characters | new shorturls: non-sequential, 7 characters |
Bit.ly | 4,000,000,000 | Warrior | scraping: non-sequential, 6 characters | new shorturls: non-sequential, 6 characters |
Goo.gl | ? | User:Scumola | started (2011-03-04) | goo.gl throttles pulls |
is.gd | 810,264,745 (2013-01-30) | Warrior | scraping: sequential, <= 5 characters | new shorturls: non-sequential, 6 characters |
ff.im | ? | User:Chronomex | only used by FriendFeed, no interface to shorten new URLs | |
4url.cc | 1279 (2009-08-14)[1] | User:Chronomex | dead (2011-02-15) | |
litturl.com | 17096 (2010-04-15)[2] | User:Chronomex | dead (2010-11-18) | |
xs.md | 3084 (2009-08-15)[3] | User:Chronomex | done | dead (2010-11-18) |
url.0daymeme.com | 14867 (2009-08-14)[4] | User:Chronomex | done | dead (2010-11-18) |
tr.im | 1990425 | User:Soult | got what we could | dead (2011-12-31) |
adjix.com | ? | User:Jeroenz0r | Already done: 00-zz, 000-zzz, 0000-izzz. | case-insensitive, incremental |
rod.gs | ? | User:Jeroenz0r | Done: 00-ZZ, 000-2Qc | case-sensitive, incremental, server can't keep up with all the requests. |
biglnk.com | ? | User:Jeroenz0r | Done: 0-Z, 00-ZZ, 000-ZZZ | case-sensitive, incremental |
go.to | 60000 | User:Asiekierka | Done: ~45000 (go.to network links only: goto_dump.zip) | no codes, only names, google-fu only gives the first 1000 results for each, thankfully most domains have less |
Name | Number of shorturls | Scraping done by | Status | Comments |
Alive
Last verified 2012-12-29. Original list last updated 2009-08-14 [5].
- awe.sm
- budurl.com - Appears non-incremental
- cli.gs - Appears non-incremental
- decenturl.com - Not at all easy to scrape.
- dlvr.it
- doiop.com - Appears non-incremental
- easyurl.net - Appears non-incremental: http://easyurl.net/afd2f
- ilix.in - HTML redirect
- jdem.cz - Incremental with random (?) last digit: http://jdem.cz/bw388
- metamark.net / xrl.us - ? http://xrl.us/bfabog
- myurl.in - http://myurl.in/xtP5H / http://urlgator.com/xtP5H /http://ug4.me/xtP5H / http://link-ed.in/xtP5H - HTML redirect
- notlong.com - Appears to be alpha-only: http://yeitoo.notlong.com/
- nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.
- pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc
- redirx.com - Lowercase alpha only, appears sequential or guessable: http://redirx.com/?wyok
- sharedby.co - See vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX
- shorl.com - Doesn't appear guessable: http://shorl.com/tisikestibahu
- shorturl.com - Probably sequential/loweralpha: http://alturl.com/wqok
- shrinkurl.us - Alway telling URL is malformed
- shrt.st - Appears incremental: http://shrt.st/vpz
- simurl.com - Doesn't appear guessable: http://simurl.com/panpes
- smarturl.eu / joturl.com / zip.sm - Doesn't appear guessable, HTML redirect.
- snipr.com / snipurl.com / snurl.com - Appears incremental: http://snipr.com/27nvst http://snipr.com/27nvtt
- surl.co.uk - Many shortening options.
- tighturl.com - Appears incremental: http://tighturl.com/30xu http://tighturl.com/30xv
- tiny.cc - Appears non-incremental
- tweetburner.com / twurl.nl - Appears incremental
- twitthis.com
- u.mavrev.com - Not accepting new urls.
- ur1.ca - Database is downloadable from website directly.
- urlcut.com
- vimeo.com
- vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string.
- xrl.us - see metamark.net
- x.se - Cannot resolve, but www.x.se works.
- yatuc.com - Not accepting new urls.
- yep.it
"Official" shorteners
- goo.gl - Google
- fb.me - Facebook
- y.ahoo.it - Yahoo
- youtu.be - YouTube
- t.co? - Twitter
- post.ly - Posterous
- wp.me - Wordpress.com
- flic.kr - Flickr
- lnkd.in - LinkedIn
- su.pr - StumbleUpon
- go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)
bit.ly aliases
- amzn.to - Amazon
- binged.it - Bing (bonus points for being longer than bing.com)
- 1.usa.gov - USA Government
- tcrn.ch - Techcrunch
Dead or Broken
- 1link.in - Website dead
- 6url.com - HTML redirect, Error 500
- ad.vu - mirror of adjix.com, application not found
- canurl.com - Website dead
- chod.sk - Appears non-incremental, not resolving
- digg.com - discontinued - [1]
- dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041
- easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3
- go2cut.com - Website dead
- gonext.org - not resolving
- imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve.
- ix.it - Not resolving
- jijr.com - Doesn't appear to be a shortener, now parked
- kissa.be - "Kissa.be url shortener service is shutdown"
- kurl.us - Parked.
- lnkurl.com - Website dead
- memurl.com - Pronounceable. Broken.
- miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."
- minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead
- minurl.org - Presently in ERROR 404
- muhlink.com - Not resolving
- myurl.us - cpanel frontend
- nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters
- qurlyq.com - Javascript redirect. Appears sequential: http://qurlyq.com/5nf. Domain parked.
- s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.
- shortlinks.co.uk - Working again. Maybe not.
- short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp
- shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.
- traceurl.com - DNS fails to resolve.
- tr.im - "Be back soon!"
- twitpwr.com - Domain parked.
- u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)
- url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."
- urlborg.com - 404 Not Found.
- urlcover.com - Domain parked.
- urlhawk.com - Domain parked.
- url-press.com - Suspended by web host.
- urlsmash.com - DNS not resolving.
- urltea.com - Dreamhost's coming soon page.
- urlvi.be - Domain parked.
- urlx.org - Owner has agreed to share his database
- w3t.org - 403 Forbidden.
- wlink.us - Domain parked.
- xaddr.com - Domain parked.
- xil.in - Under construction.
- xym.kr - Gibberish (?) Korean text blog.
- yweb.com - Suspicious iframe with long url and fake loading gif image.
- zi.ma - DNS not resolving.
Discontinued
- urlbrief.com - co-operates with 301Works.org