Difference between revisions of "URLTeam"
(update with new total of scanning of yoolink-to)
|Line 524:||Line 524:|
Revision as of 03:40, 9 November 2015
url shortening was a fucking awful idea
|Archiving status||In progress...|
|Project source||Old: urlteam-stuff tinyback tinyarchive|
|Project tracker||http://tracker.archiveteam.org:1337/ (HTTPS)|
|IRC channel||(on hackint)|
TinyURL, bit.ly and other similar services allow long URLs to be converted to smaller ones on their specific service; the small URL is visited by a consumer and their web browser is redirected to the long URL.
Such services are a ticking timebomb. If they go away, get hacked or sell out millions of links will be lost (see Wikipedia: Link Rot). Archive.org/301Works is acting as an escrow for URL shortener databases, but they rely on URL shorteners to actually give them their databases. Even 301Works founding member bit.ly does not actually share their databases and most other big shorteners don't share theirs either.
The fine folks at archive.org have provided us with upload permissions to the 301Works archive: http://www.archive.org/details/301utm. They unfortunately do not want to make them downloadable, but the same data is in our torrents too, just in a different format (we use pipe-delimited, xz-compressed files while 301works uses comma-delimited uncompressed files).
- fetcher.pl: Perl-based scraper by User:Chronomex
- TinyBack: Python 2.x-based, distributed scraper (formerly used)
- Terror of Tiny Town: currently used by ArchiveTeam
Terror of Tiny Town
The easiest way to help with scraping is to run the Warrior and select the URLTeam 2 project. You can also run ToTT outside the warrior; to do so, follow the instructions at https://github.com/ArchiveTeam/terroroftinytown-client-grab.
|Name||Est. number of shorturls||Scraping done by||Status||Comments|
|http://goo.gl||?||User:Scumola||started (2011-03-04)||goo.gl throttles pulls|
|http://ff.im||?||User:Chronomex||only used by FriendFeed, no interface to shorten new URLs|
|http://4url.cc||1279 (2009-08-14)||User:Chronomex||dead (2011-02-15)|
|http://litturl.com||17096 (2010-04-15)||User:Chronomex||dead (2010-11-18)|
|http://xs.md||3084 (2009-08-15)||User:Chronomex||done||dead (2010-11-18)|
|http://url.0daymeme.com||14867 (2009-08-14)||User:Chronomex||done||dead (2010-11-18)|
|http://tr.im (old)||1990425||-||got what we could||dead (2011-12-31)|
|visibli (hex)||16777216||User:Chfoo Warrior||In progress
Done. 15104865 301MB
|Using links.sharedby.co/links/ as URL prefix. |
|http://zapd.co Zapd||326592||User:Chfoo||Done. 144093 1.7M||xxxx.zapd.co. Uploaded to IA|
|http://bre.ad Bre.ad||120932351||User:Chfoo||Incomplete (59771889 examined). 54506 1.2MB||de.ad (2013-11-18). Uploaded to IA
Got what I can without overloading their EC2 instance.
|Name||Number of shorturls||Scraping done by||Status||Comments|
For the latest TinyTown updates, please see chfoo's spreadsheet.
|Warrior project name||Est. # shorturls||Last scraped date||Initially scraped date||# checked||Example URL||Incr||Comments|
|10,000,000,000||2022-08-16||2014-11-22||2,563,971,400||http://tinyurl.com/mxzufis||N||done: sequential to zzzzzz; current: non-sequential, 7 characters|
|50,000,000,000||2022-08-16||2014-11-22||3,916,718,000||https://bit.ly/1Zmfo8z||N||done: non-sequential 6 characters; current: non-sequential, 6 characters|
|934,134,706 (as of 2013-05-20)||2022-08-16||2014-11-22||2,106,171,900||http://is.gd/mBNPCM||?||done: sequential up to ZZZZZ ; new shorturls: non-sequential, 6 characters|
|?||2014-12-12||2014-12-01||42,813,150||?||?||done: sequential to 42pzz ; dead (2014-JUL-17) ; Appears incremental - Ex: http://tr.im/44tn2 http://tr.im/44tn4|
|?||2022-08-16||2015-01-29||41,887,850||?||?||(Also see http://vsb.li. Double redirects via USERNAME.sharedby.co/share/XXXXXX ) (and http://shrd.by )|
|?||2014-12-20||2014-12-17||10,874,150||?||?||new shorturls: sequential ; FOSS, run by StatusNet; claims to offer a download of their database, but it just contains garbage|
|?||2015-03-06||2015-01-24||181,015,750||?||Y||new shorturls: sequential ; snipr.com / snipurl.com / snurl.com - Appears incremental - Ex: http://snipr.com/27nvst http://snipr.com/27nvtt. snipr.com and snipurl.com work but appear infected with malware.|
|?||2015-04-20||2015-03-06||293,372,600||?||?||see snipurl entry|
|?||2015-01-12||2015-01-11||624,000||?||Y||new shorturls: sequential|
|?||2015-01-15||2015-01-11||1,842,350||?||Y||Appears down; new shorturls: sequential|
|?||2015-05-27||2015-03-06||39,367,900||http://alturl.com/wqok||?||Appears to redirect to http://shorturl.com ; Probably sequential/loweralpha|
|?||2014-12-18||2014-12-17||3,303,100||?||?||Appears down; Argyle Social, main page 404s, existing urls still work|
|?||2015-04-04||2014-12-24||967,591,000||?||?||main page redirects, doesn't allow for new urls to be publicly shortened, existing urls still work|
|1,500||2015-11-08||2014-11-06||3,250||http://burl.se/428||Y||200 re-checked on 2015-11-08|
|420,000,000||2022-08-16||2015-11-08||620,000||?||Y||up to 5 characters, mixed case alphanumeric, currently around rZIfF (as of 02:00, 2 November 2015 (EST))|
|?||2015-03-10||2015-01-20||367,074,900||?||Y||new shorturls: sequential ; (aliases: http://htl.li & http://ht.ly )|
|?||2015-04-04||2014-12-13||989,290,500||?||?||Related to the pond called Philadelphia, where links are born and raised, doesn't allow for new urls to be publicly shortened, existing urls still work|
|?||2014-11-22||2014-11-16||15,067,550||?||?||Now part of Oracle|
|?||2015-04-08||2014-12-13||1,031,784,400||?||?||Still resolves URLs, but the homepage is 404; related to http://sharethis.com|
|?||2014-11-06||2014-11-06||738,600||http://shrt.st/vpz||Y||Appears down; doesn't allow new urls to be shortened, existing urls still work.|
|?||2014-11-06||2014-11-06||54,100||?||?||still resolves URLs, but site just shows blank page|
|?||2014-12-13||2014-12-13||585,800||?||?||Doesn't make any more shorturls|
|?||2014-12-18||2014-11-16||832,351,700||?||?||dead; viddy; partially saved|
|?||2014-11-16||2014-11-16||2,151,400||?||?||Requires free login to create shorturls|
|Y||Appears incremental, but custom ones also exist (up to 10 characters); requires (free) GoDaddy account to create short URLs|
|?||2015-01-10||2014-12-12||161,601,700||?||?||self-saved; Thank you Metamark for the database dump!|
|?||2015-01-14||2015-01-10||28,675,900||?||?||see xrl-us entry|
y-ahoo-it_5: 982,090,300 checked between 2014-11-06 and 2015-02-25
y-ahoo-it_6: 1,670,279,150 checked between 2014-11-06 and 2015-04-03
y-ahoo-it_8: 1,952,022,300 checked between 2014-11-06 and 2015-04-04 . Now dead.
|?||2014-12-13||2014-12-13||597,150||?||?||Not accepting new urls.|
|?||2015-11-08||2014-11-16||333,450||http://yoolink.to/1dwa||Y||Up to 3 characters scraped on 2014-11-16 (a small 2 character segment re-scraped on 2014-11-22); 4 characters (up to 1dwa) scraped starting 2015-11-08|
|Warrior project name||Est. # shorturls||Last scraped date||Initially scraped date||# checked||Example URL||Incr||Comments|
Last verified 2014-12-07. Original list last updated 2009-08-14.
- adf.ly - Ex: http://adf.ly/bnpYL
- ask.fm - Ex: ask.fm/a/40k05kgp
- budurl.com - Appears non-incremental
- buff.ly - Buffer App
- cf.ly (CashFly.com)
- cli.gs - Appears non-incremental
- cl.ly - CloudApp
- cmt.com - Country Music Television
- cur.lv (CoinURL.com)
- decenturl.com - Not at all easy to scrape.
- del.ly - sprinklr
- df4.us - daringfireball.net
- dld.bz - "private URL shortening service"
- dlvr.it - Requires free login; then requires connecting to another service; URLs are shortened when sent through. ( as of 01:36, 2 November 2015 (EST))
- doiop.com - Appears non-incremental
- dwurl.hu - Allows public shortening; appears to give 6 character, mixed case alphabetic (no digits), non-incremental URLs, e.g. http://dwurl.hu/gMEtiA ( as of 01:36, 2 November 2015 (EST))
- easyurl.net - Appears non-incremental. Ex: http://easyurl.net/afd2f
- fav.me - Used by DeviantArt. Ex: http://fav.me/d31sfml
- flip.it - Flipboard
- flpbd.it - Flipboard
- fnd.us (See offical shorteners)
- fos.hu – incremental alphanumeric, but shares pattern with an image sharing service
- jdem.cz - Incremental with random (?) last digit - Ex: http://jdem.cz/bw388
- kics.it – Restricted access to shourturl creation
- ln.is - linkis.com
- mgnet.me - for torrent magnet URIs.
- moourl.com – Random
- my.dot.tk/tweak - Appears non-incremental
- nblo.gs -- no obvious way to create URLs from the home page as of 20:08, 7 November 2015 (EST)
- news.me -- no obvious way to create URLs from the home page as of 20:08, 7 November 2015 (EST)
- nohref.hu – Allows custom shorturl & deletes links after a specified time period (or 1 year without use)
- notlong.com - Appears to be alpha-only - Ex: http://yeitoo.notlong.com/ ; doesn't seem to be allow creating new shorturls, as of 20:08, 7 November 2015 (EST)
- nutshellurl.com - Appears incremental. 301s to a redirector script, which then 301s you to the destination.
- p.pw -- sells interstitial ads before showing the full URL; likely to be harder to scrape (as of 20:08, 7 November 2015 (EST))
- pear.ly - Used by pearltrees.com. Ex: http://pear.ly/6J1H
- pnut.co - see nutshellurl.com Ex: http://pnut.co/3a
- po.st -- "social sharing platform"; no obvious way to create URLs from the home page (as of 20:08, 7 November 2015 (EST))
- prsm.tc - getprismatic.com
- rod.gs - up to 3 characters, alphanumeric, creating new ones appears to hang (as of 02:14, 2 November 2015 (EST))
- sdai.ly – Allows custom shorturl
- shorl.com - Doesn't appear guessable - Ex: http://shorl.com/tisikestibahu
- shorte.st - sells interstitial ads before showing the full URL; likely to be harder to scrape
- shrinkurl.us - Still resolves, but does not allow creating new URLs ("The URL you entered was not valid or did not exist.")
- smarturl.eu / joturl.com - Doesn't appear guessable, HTML redirect.
- smarturl.it - smartURL
- soa.li - Gigya inc.
- soc.li - Gigya inc.
- spne.ws - Silicon Prairie News
- spnsr.tw - sponsoredtweets.com
- surl.co.uk - Many shortening options.
- techme.me - Techmeme
- tinyarrows.com / ta.gd / ri.ms / ➡.ws / ➨.ws / ➯.ws / ➔.ws / ➞.ws / ➽.ws / ➹.ws / ✩.ws / ✿.ws / ❥.ws / ›.ws / ⌘.ws / ‽.ws / ☁.ws - Appears non-incremental: uses user-defined words for URLs (e.g. http://➡.ws/URLTEAM)
- tiny.cc - Appears non-incremental
- totesz.hu/x – Allows custom shorturl
- trib.al -- Does not appear to allow public creation of new short-URLs; owned by SocialFlow
- twitthis.com -- requires a Twitter account to create shortURLs (as of 20:08, 7 November 2015 (EST))
- urlcut.com - "We are not currently accepting new redirects at this time." ; existing ones seem to still work, e.g. http://urlcut.com/1xvha (as of 02:09, 2 November 2015 (EST))
- usite.hu/link.php – Numeric incremental, public database
- vk.cc -- no obvious way to create URLs from the home page (as of 20:08, 7 November 2015 (EST))
- y2u.be - meant for YouTube videos
- yep.it -- allows custom shortcodes; validates provided URL; example: http://yep.it/bgnhpu ; seems non-incremental, only lowercase letters; appears to make the whole database available via: http://yep.it/stat.php?page=5719 (as of 20:08, 7 November 2015 (EST))
- bln.gs - Blingee (format: bln.gs/b/28fss0 and bln.gs/b/1)
- bull.hn - Bullhorn Reach (format: bull.hn/l/19JQE/)
- CokeURL.com - Coca-Cola (examples: CokeURL.com/3yuz9 ; CokeURL.com/vs5s ; Cokeurl.com/theaterseat )
- db.tt - Dropbox
- di.sn - Disney
- fb.me - Facebook
- flic.kr - Flickr
- fnd.us - Fundrazr.com
- fxn.ws - Fox News
- g.co - Google (used for Google products and services)
- getpocket.com/s/ - Pocket
- goo.gl - Google
- go.usa.gov - USA Government (and since they control the Internets, it doesn't get much more official than this)
- git.io - GitHub only URLs
- gty.im - Getty Images (format: gty.im/488068439; links by editorial number)
- gu.com - The Guardian (weird format - https://gu.com/p/3f7ca )
- hub.me - HubPages
- ift.tt - IFTTT
- igg.me - Indiegogo
- lnkd.in - LinkedIn
- mfi.re - MediaFire
- msft.it - Microsoft (or maybe something called "Sprinklr"?)
- mysp.ac - Myspace
- nydn.us - New York Daily News
- off365.ms - Office 365
- pocket.co - Pocket
- post.ly - Posterous
- redd.it - Reddit
- reut.rs - Reuters
- rsg.ms - Rockstar Games
- skfb.ly - Sketchfab
- spoti.fi - Spotify
- stanford.io - Stanford University
- su.pr - StumbleUpon
- sx3.se - swedishstartupspace.se
- t.co - Twitter
- ti.me - Time Magazine
- tmblr.co - Tumblr
- tw.appstore.com - Apple App Store
- uoft.me - University of Toronto
- upl.nu - Ung Pirat (Youth Pirate Party, Sweden)
- vstphl.ly - Visit Philly
- wapo.st - Washington Post
- wh.gov - White House (format: wh.gov/i3lXR)
- wp.me - Wordpress.com
- youtu.be - YouTube
- hrts.me - University of Hertfordshire. Seems to be 5 characters long. a-z with usage of capitals and non capitals. Includes numbers. Mainly used on https://twitter.com/UniofHerts
A bit.ly alias works just like a bit.ly URL. The shortcode is the same, it sets the same bit.ly cookie, and DNS resolving the address shows the IP addresses are the same as bit.ly. The homepage may be different however.
- abcn.ws - ABC News (examples: abcn.ws/1aOoijH ; abcn.ws/okiWbi )
- 1.usa.gov - USA Government
- 4sq.com - Foursquare
- aje.me - Aljazeera
- amzn.to - Amazon
- atfp.co - Foreign Policy
- bbc.in - BBC
- binged.it - Bing (bonus points for being longer than bing.com)
- bnkrpt.am - Bankrupting America
- bzfd.it - Buzzfeed
- carrot.cr - Carrot Creative
- cb.com - Career Builder
- chzb.gr - Cheezeburger
- cmplx.it - Complex Magazine
- cnet.co - CNET
- cnnmon.ie - CNN Money
- conta.cc - Constant Contact Inc.
- corb.is - Corbis Images
- cpurl.net - Current Photographer.com
- curbed.cc - Curbed.com
- dennysd.in - Denny's Restaurants
- dtoid.it - Destructoid
- econ.st - The Economist
- emarketee.rs - Emarketeers
- engri.sh - Engrish.com
- eonli.ne - E! Online
- es.pn - ESPN
- fakes.pn - The Fake ESPN (at lockerdome.com)
- fanpa.ge - Fanpage.it
- feedly.com/k/ - redirect, see below for their own
- gaw.kr - Gawker
- geekiss.im - Geekismo
- grd.to - The Grid TO
- grn.bz - GreenBiz
- gtg.lu - GetGlue
- hoblu.es - House of Blues
- hub.am - HubSpot
- huff.to - Huffington Post
- ift.tt - IFTTT
- j.mp - bit.ly
- jrnl.to - thejournal.ie
- kck.st - Kickstarter
- marsdd.it - MaRS Discovery District
- mbist.ro - MediaBistro
- mojo.ly - Mother Jones
- muo.fm - MakeUseOf
- mwne.ws - MarketWired News
- nie.mn - Neiman Journalism Lab
- nokia.ly - Nokia
- nyti.ms - New York Times
- onforb.es - Forbes
- onion.com - The Onion
- pops.ci - Popular Science
- popu.pe - Pop-Up Pantry
- propub.ca - ProPublica
- read.bi - Business Insider
- rseo.co - realseo
- s831.us - Studio831 - whatever that is
- sbn.to - sbnation
- skygrid.me - SkyGrid
- slackers.co - slackers.com
- squid.us - Laughing Squid
- s.shr.lc - shareaholic - Naive, redirects any shortcode to bit.ly
- stjo.es - St. Joseph Media
- tcrn.ch - Techcrunch
- theatln.tc - The Atlantic
- tnw.co - The Next Web
- tom.hn - Tom Hillenbrand
- toms.sh - TOMS Shoes
- tvt.ag - tvtag.com
- txpr.de - TexasStore
- unr.ly - Unruly media
- usat.ly - USA Today Newspaper
- vrge.co - The Verge
- yhoo.it - Yahoo! (not to be confused with y.ahoo.it, their non-bitly public url shortener)
- zite.to - Zite
Dead or Broken
- 1link.in - Website dead
- 6url.com - HTML redirect, Error 500
- ad.vu - mirror of adjix.com, application not found
- biglnk.com - dead, replaced with unrelated blog
- bwtm.co - DNS fails to resolve.
- calyp.co - Server error. 403 - Forbidden: Access is denied.
- canurl.com - Website dead
- chod.sk - Appears non-incremental, not resolving
- come.to - Related to various .to shorteners. Started in 1997, killed in 2013 after parent company died.
- da.co - Parked.
- digg.com - discontinued - 
- dwarfurl.com - Website dead/Numeric, appears incremental: http://dwarfurl.com/08041
- easy.tc - DNS not resolving.
- easyuri.com - Website dead/Appears hex incremental with last digit random/checksum: http://easyuri.com/1339f , http://easyuri.com/133a3
- eqent.me - Improper redirect to bitly.
- feedzil.la - Domain parked.
- go2cut.com - Website dead
- gob.li - Golbin Ridge Limited. Timed out
- gonext.org - not resolving
- go.to - sold its domains on Sedo apparently.
- go2.me - everything 404s
- hashonomy.com - Timed out
- htcdev.net - DNS not resolving.
- iawtp.me - DNS not resolving
- icymi.me - DNS not resolving
- ilix.in - domain parked
- imfy.us - requires a recaptcha to get to the linked site, and avast goes nuts. DNS fails to resolve.
- inspr.in - Inspired Beta. Can't find server
- ix.it - Not resolving
- jijr.com - Doesn't appear to be a shortener, now parked
- joomlagyar.hu/usb - DNS not resolving
- jump.to - dead as of February 1, 2013
- kissa.be - "Kissa.be url shortener service is shutdown"
- kl.am - "kl.am Closes its Shell"
- kuijt.nu - replaced with unrelated site
- kurl.us - Parked.
- lnkurl.com - Website dead
- marv.ly - DNS fails to resolve.
- mash.to - Cannot connect.
- memurl.com - Pronounceable. Broken.
- me.lt - Connection refused.
- mens.hm - Not responding (timeout)
- miklos.dk - Doesn't appear guessable: http://miklos.dk/!z7bA6a - "Vi arbejder på sagen..."
- minilien.com - Doesn't appear guessable: http://minilien.com/?9nyvwnA0gh - Website dead
- minim.in - Times out
- minurl.org - Presently in ERROR 404
- ms.me - Parked.
- msplinks.com - Used by Myspace
- mtw.tl - everything 403s
- muhlink.com - Not resolving
- mytinyurl.com - redirects to an unrelated image
- myurl.us - cpanel frontend
- myv.bz - Not resolving
- nyturl.com - NY Times (bonus points for being longer than nyt.com, which they own). Taken by squatters
- onvzi.com - DNS fails to resolve.
- otf.me - Empty WordPress site
- ping.fm - Fails to resolve.
- pln.so - Not working.
- plzretwt.me - Fails to resolve.
- pnt.me - Doesn't appear guessable, too big a space to bruteforce: http://pnt.me/FzAblc
- pulsene.ws - Expired. Parked by GoDaddy.
- re.ad - Fails to resolve.
- redirx.com - Lowercase alpha only, appears sequential or guessable - Ex: http://redirx.com/?wyok. Website still online but does not resolve existing URLs nor does it allow creating new ones (responds with the message: blame the spammers)
- see.sc - Fails to resolve.
- s.me - Domain parked.
- say.ly - redirects to unrelated site
- s3nt.com - Probably sequential. http://s3nt.com/aa goes somewhere different from /ab . Domain parked.
- shortlinks.co.uk - Working again. Maybe not.
- short.to - Domain is parked - Probably sequential/loweralpha: http://short.to/msmp
- shrinklink.co.uk - Doesn't appear sequential: http://www.shrinklink.co.uk/45bmx , www.shrinklink.co.uk/npk6xp . Domain parked.
- shrtn.us - myshorturls.appspot.com. 404, does not resolve
- simurl.com - Doesn't appear guessable - Ex: http://simurl.com/panpes. Website is blank; does not resolve URLs ("This SimURL is now inactive")
- smf.is - DNS not resolving.
- sns.mx - SNS Analytics, domain parked
- sq.com - Now redirects to Singapore Airlines.
- tiny.ly - DNS not resolving.
- tm.to - Twtmore has "flown away"
- to.gg - Global Giving, everything 503s
- traceurl.com - DNS fails to resolve.
- tr.im (1st generation) - "Be back soon!"
- tweetburner.com / twurl.nl - Appears incremental, everything 404s
- twixar.com - "Estamos fora do ar por algum tempo, mas estamos trabalhando para voltar a oferecer o serviço para encurtar URLs longa em breve!"
- twthpr.co - DNS not resolving.
- twitpwr.com - Domain parked.
- u.mavrev.com - Stopped accepting new urls. Now times out
- u.nu - "The shortest URLs. period." Website dead since at least 1st of october 2010 (http://web.archive.org/web/20100104023208/http://u.nu/)
- url9.com - Sequential, alphanumeric. Leading 0s are significant. "The site is working correctly."
- urlborg.com - 404 Not Found.
- urlcover.com - Domain parked.
- urlhawk.com - Domain parked.
- url-press.com - Suspended by web host.
- urlsinn.com - DNS not resolving.
- urlsmash.com - DNS not resolving.
- urltea.com - Dreamhost's coming soon page.
- urlvi.be - Domain parked.
- urlx.org - Owner has agreed to share his database
- uxp.in -
still resolves URLs, but site just shows blank page. Domain parked.
- vibemag.co - Vibe Magazine. Times out
- vsb.li / links.visibli.com/links/ - The latter uses truncated md5 hex string. See sharedby.co.
- w3t.org - 403 Forbidden.
- wlink.us - Domain parked.
- wl.tl - DNS not resolving.
- xaddr.com - Domain parked.
- xil.in - Under construction.
- x.se - Cannot resolve, but www.x.se works.
- xym.kr - Gibberish (?) Korean text blog.
- y.ahoo.it - Yahoo
- yweb.com - Suspicious iframe with long url and fake loading gif image.
- zi.ma - DNS not resolving.
- zip.sm - was a redirect to joturl.com. Now times out
- adjix.com -
Still resolves URLs, but site does not work: "The requested application was not found on this server."- Is static host on AWS service.
- feedly.com/e/ - realized that URL shorteners were bad . Non-cooperative.
- metamark.net / xrl.us - no longer allowing new urls to be shortened, existing urls still work (Ex. http://xrl.us/bfabog). Uploaded a database dump to Internet archive.
- urlbrief.com - co-operates with 301Works.org
Check out Audit2014 and help audit the archives. In particular, the stuff not on Internet Archive needs to be uploaded.
- See the latest torrent release for URLs before Tinytown. A copy is available at URLTeamTorrentRelease2013July
- Tinytown results are uploaded to the Internet Archive. They are incremental, so you will need to download them all to get all URLs.
Common URL shortening software
Ha-ha! Please don't run a URL shortening service.