Difference between revisions of "Windows Live Spaces"

From Archiveteam
Jump to navigation Jump to search
(Added SpaceInvaderTurbo.pl)
Line 76: Line 76:
  
 
=== Using Perl/Wget ===
 
=== Using Perl/Wget ===
[http://pastebin.com/Lr0Xn0Wm SpaceInvader2.pl] is a small Perl script to simplify the task of archiving a list of SpacesIt parses a hotlist and calls Wget on each line, one at a time.  Due to insufficient time and planning, it doesn't download any off-site dependencies and isn't multi-threaded.  If you are able to correct either of these issues, please notify Auguste.
+
There are two Perl scripts:
 +
* [http://pastebin.com/Lr0Xn0Wm SpaceInvader2.pl]
 +
** Downloads the list of Spaces, one Space at a time, using Wget.  You probably want this one.
 +
* [http://pastebin.com/W6dhEwV2 SpaceInvaderTurbo.pl]
 +
** Spawns multiple instances of Wget to download everything at onceIf you have a hotlist of 1,000 Spaces, this means 1,000 instances of Wget, all downloading simultaneouslyYou take responsibility for any damage it may or may not do.
 +
 
 +
Due to insufficient time and planning, these scripts don't download any off-site dependencies.
  
 
==== Requirements ====
 
==== Requirements ====

Revision as of 04:32, 20 March 2011

Windows Live Spaces
Windows Live Spaces logo
Windows Live Spaces homepage (after logging in), as at 18 March 2011
Windows Live Spaces homepage (after logging in), as at 18 March 2011
URL http://spaces.live.com
Project status Closing on 2011-03-16[1]
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam (on EFnet)
Project lead Unknown

Microsoft announced that it would shut down Windows Live Spaces on March 16, 2011. That day has been and gone, and as of March 20, 2011, Windows Live Spaces is still running. NovaKing is currently scraping Bing for more profiles, so we'll need all the help we can get as soon as he can start producing lists.

Since September last year Microsoft has been notifying every user who had a Space active to migrate it to Wordpress using your Windows Live ID, save it to your hard drive or remove. Still, there are many long-abandoned blogs that have not yet been migrated so that they may not survive. For these reasons I decided to create this tutorial to know who wants to save some Spaces.

Any correction is welcome and any questions or suggestions about it must be raised in the discussion of the article.

Downloading Spaces

We will need a lot of help to download as many Spaces as possible.

  • Swicher is currently using HTTrack to download his list.
  • Auguste and Dr-Spangle are using Wget to download an additional copy of that same list (see Hotlists).
  • NovaKing is currently scraping Bing for more profiles. At the rate he's been going, he could have a list of tens of thousands of profiles ready soon.

Hotlists

This is a list of all Spaces hotlists and their current status. They are generally split into chunks of 1,000 Spaces, and are intended to be downloaded with Perl/Wget, but if you can't use that, use whatever you have available. Microsoft could shut Windows Live Spaces down at any time (it's already past D-Day), so it's better to grab something than nothing.

If you would like to take ownership of one, speak to Auguste on IRC.

If you have taken ownership, please update this table as soon as you are finished, or let Auguste know if you are unable to complete it.

Filename URL Owner Status Status notes
wls 0001-1000.txt http://pastebin.com/FMJh3vAa Auguste, Dr-Spangle In progress Auguste has downloaded ~580 Spaces as of 20 March.
wls 1001-2000.txt http://pastebin.com/xrXfPbL4 Auguste, Dr-Spangle In progress Auguste has downloaded ~580 Spaces as of 20 March.
wls 2001-2202.txt http://pastebin.com/KAVYAW3c Auguste, Dr-Spangle In progress Auguste has downloaded ~580 Spaces as of 20 March.
wls 2203-3000.txt http://pastebin.com/pygEEHBr Hydruh In progress
wls 3001-4000.txt http://pastebin.com/LS8nvgdN Jeroenz0r In progress

Download Statistics

Auguste has collected some potentially useful stats and information:

  • 120 Spaces can be downloaded in a period of about 24 hours
  • On average, 50 Spaces take around 1GB of storage
  • Each Space will compress down to about 10% of its original size
    • 50 Spaces therefore should compress to about 100MB

Tools

There are several ways to download Windows Live Spaces, but using Wget/Perl is recommended.

Using Perl/Wget

There are two Perl scripts:

  • SpaceInvader2.pl
    • Downloads the list of Spaces, one Space at a time, using Wget. You probably want this one.
  • SpaceInvaderTurbo.pl
    • Spawns multiple instances of Wget to download everything at once. If you have a hotlist of 1,000 Spaces, this means 1,000 instances of Wget, all downloading simultaneously. You take responsibility for any damage it may or may not do.

Due to insufficient time and planning, these scripts don't download any off-site dependencies.

Requirements

You'll need Perl and Wget. GNU/Linux users should have these out of the box, but Windows users will need ActiveState Perl (or similar) and Wget for Windows.

Usage

To use SpaceInvader2.pl, copy it into the directory that you want your Spaces to be downloaded into.

Usage: SpaceInvader2.pl "HOTLIST"

HTTrack (graphic version)

I will explain what is the procedure to download one or more Spaces using HTTrack graphic version (WinHTTrack in Windows and in Linux is called WebHTTrack).

I assume that the reader should be familiarized with the use of WinHTTrack (or WebHTTrack) so I'll just explain that you need configure (in the Option Panel of the program) to download a Space of Windows Live Spaces.[1]

In the section "Scan Rules" must be added the following lines:

+*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
+*.7z
+*.pdf +*.doc +*.mid +*.3gp +*.djvu +*.amr +*.mp4 +*.ogg +*.ogv +*.ogm
+*.mov +*.mpg +*.mpeg +*.avi +*.asf +*.mp3 +*.mp2 +*.rm +*.wav +*.vob +*.qt +*.vid +*.ac3 +*.wma +*.wmv
+*.zip +*.tar +*.tgz +*.gz +*.rar +*.z
+*.arj +*.dar +*.lzh +*.lz +*.lza +*.arc
+*.gif +*.jpg +*.png +*.tif +*.bmp
-*.entry#comment
+*.profile.live.com/Lists/*
+*.byfiles.storage.live.com/*
+*.photos.live.com
+*.spaces.live.com

Line 1 to 7 indicate what types of files are downloaded from a Space (if the program finds one these and this lines can be modified to suit the user), the line 8 is because the program tries to capture the comments any post of a blog on Windows Live Spaces and this action generates errors (in addition to a waste of time when exploring a site), line 9 and 12 are used to capturing Spaces of the list of "friends" who might have the Space user which is capturing at that time (these lines are optional), and lines 10 and 11 are to capture the files and photos[2] that the user can have uploaded there.

Finally add in the field Browser "Identity" (from the section Browser ID) the following User Agent:

Googlebot/2.1 (+ http://www.googlebot.com/bot.html)

LSSaver

[3] LSSaver is a Windows Freeware software to save an Windows Live Space blog to your local disk. It saves useful informations such as, blog title, content and comments. It is able to save the pictures included in the blog to local disk also.

LSSaver is very simple to use, so that its operation is:

  • First, you need to enter a Microsoft Live Space username.
  • Then, you click on the "Get" button to retrive all blog entries[4], when a blog entry is retrieved, it's title will appear in the tree which is the left part of the window. Wait until all titles are trieved. Then you can browse your blog titles by fold/unfold tree, check those you want to save. Once a blog entry is checked, it's content will appear on the right part of the window, check all blogs you want to save and wait until all of them appear.
  • To save the selected blogs, you simply click the Save button, a file selection window will open, select where the files will be saved and give a file name and click the Save button on the window, after a while, all the selected blogs are saved. The saved file is a HTML file, you can open it with a browser.

The program works as it should but we must take into consideration some details that differentiate it from any web site downloader:

  • As explained before, when the program save a blog all the articles (and comments) are crammed into an HTML file (which could become a problem if the blog has a lot of content).
  • The names of the images are stored as 000001, 000002, etc. thus avoiding that the original can be found on the Internet (this refers to the images of external sites linked in a blog) or recognize the file format.

Auguste's unfinished script

Auguste began to write this script to parse a profile's friend list for more profiles, then output them to a textfile and remove duplicates. Unfortunately, it wasn't completed. Anyone is welcome to use it or finish it off, but NovaKing is currently using a superior scraper to grab profiles from Bing.

Trivia

  • Existed 7.000.000 active sites[5], but until March 2 only 1.000.000 have been moved to WordPress[6], that is, only 14% of total.
  • Here you can see some statistics from user Swicher on their progress to download some Spaces.

Notes

  1. If you do not know how to use this program you can check this tutorial (in English) or this one (in Spanish)
  2. I'm not sure if which is allocated in *. photos.live.com will continue to exist after March 16, then I take the opportunity to save the section "Photos" of Spaces (if the user of Space has this section) so that the line 11 is also optional
  3. Some of the descriptions in this section was taken from http://www.softsea.com/review/LSSaver.html
  4. This operation may take up to several minutes depending on the number of entries that contains a blog, as well the user connection
  5. Over 500,000 Windows Live Spaces blogs migrated to WordPress.com
  6. Over one million new blogs on WordPress.com, but time is running out

External links


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ