Difference between revisions of "Friendster"

From Archiveteam
Jump to navigation Jump to search
Line 155: Line 155:
 
| 3700001 || 3800000 || Claimed || || yipdw
 
| 3700001 || 3800000 || Claimed || || yipdw
 
|-
 
|-
| 3.8M+1 || 3.9M || Claimed || 3.9GB || [[User:Dnova|dnova]]  
+
| 3.8M+1 || 3.9M || Claimed || || [[User:Dnova|dnova]]  
 
|-
 
|-
 
| || 124138261 || Pool || || — ''Check for an update to the script!''
 
| || 124138261 || Pool || || — ''Check for an update to the script!''

Revision as of 13:54, 7 May 2011

Friendster
Friendster logo
Friendster - Home 1304442914645.png
URL http://www.friendster.com/[IAWcite.todayMemWeb]
Project status Closing
Archiving status In progress...
Project source Unknown
Project tracker Unknown
IRC channel #archiveteam (on EFnet)
Project lead Unknown

Friendster is an early social networking site which announced on April 25th, 2011 that most of the user-generated content on the site would be deleted on May 31st, 2011. It's estimated that Friendster has over 115 million registered users.

IMPORTANT NEWS!

We've discovered an error in recent versions of bff.sh that means many people may have incorrect downloads. Please see Friendster#More on the missing image problem to see if you have been affected!

How to help

Scrape profiles

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start End Status Size (Uncompressed) Claimant
1 999 Uploaded 55MB closure
1000 1999 Uploaded 283MB alard
2000 2999 Uploaded 473MB DoubleJ
3000 3999 Done 234MB Teaspoon
4000 4999 Uploaded 183MB Paradoks
5000 5999 Uploaded 202MB robbiet48/Robbie Trencheny (Amsterdam)
6000 9999 Uploaded 1.1gb Sketchcow/Jason Scott
10000 29999 Uploaded 5.1gb Sketchcow/Jason Scott
30000 31999 Uploaded 485mb Sketchcow/Jason Scott
32000 32999 Uploaded 201MB Paradoks
33000 33999 Uploaded 241mb closure
34000 100000 Uploaded unknown (20+ gb?) closure
100000 101000 Done 205.6 MB xlene
101001 102000 Uploaded 232MB robbiet48/Robbie Trencheny (Florida)
102001 103000 Uploaded 241MB robbiet48/Robbie Trencheny (Amsterdam)
103001 104000 Done yipdw
104001 105000 Done 231MB Coderjoe
105001 114999 Uploaded 2.1GB Paradoks
115000 116999 Done yipdw
117000 119999 Done 720MB Coderjoe
120000 130000 Uploaded 2.3GB robbiet48/Robbie Trencheny (Florida)
130000 140000 Claimed robbiet48/Robbie Trencheny (Florida)
140001 160000 Done yipdw
160001 180000 Claimed jch
180001 200000 Done yipdw
200001 220000 Done 8.1GB Coderjoe
220001 230000 Claimed xlene
230001 240000 Uploaded 4.4GB alard
240001 250000 Claimed Teaspoon
250001 260000 Claimed robbiet48/Robbie Trencheny (Newark)
260001 270000 Uploaded 4.0GB robbiet48/Robbie Trencheny (Fremont 1)
270001 280000 Uploaded 3.2GB robbiet48/Robbie Trencheny (Fremont 2)
280001 290000 Claimed DoubleJ
290001 300000 Uploaded 3.9GB dnova
310001 320000 Done 5.1GB Coderjoe
320001 330000 Claimed robbiet48/Robbie Trencheny (Oakland)
330000 340000 Uploaded closure
340000 400000 Claimed Sketchcow/Jason Scott
400001 500000 Claimed DoubleJ
500000 600000 Claimed closure
600001 700000 Claimed no2pencil
700001 800000 Claimed proub/Paul Roub
800001 900000 Claimed proub/Paul Roub
900001 1000000 Claimed Soult
1000001 1100000 Claimed Avram
1100001 1200000 Claimed Paradoks
1200001 1300000 Claimed db48x
1300000 1400000 Claimed closure (penguin)
1400001 1500000 Claimed alard
1500001 1600000 Claimed ksh/omglolbah
1600001 1700000 Claimed ksh/omglolbah
1700001 1800000 Claimed ksh/omglolbah
1800001 1900000 Claimed ksh/omglolbah
1900001 2000000 Claimed ksh/omglolbah
2000001 2100000 Claimed ksh/omglolbah
2100001 2200000 Claimed Teaspoon
2200001 2300000 Claimed Darkstar
2300001 2400000 Claimed Darkstar
2400001 2500000 Claimed underscor (snookie)
2500001 2600000 Claimed Bardicer
2600001 2700000 Claimed Robbie Trencheny (Amsterdam)
2700001 2800000 Claimed Robbie Trencheny (Fremont 2)
2800001 2900000 Claimed Coderjoe (system1)
2900001 3000000 Claimed Coderjoe (system2)
3000001 3100000 Claimed Qwerty0
3100001 3600000 Claimed Jason Scott/Sketchcow
3600001 3700000 Claimed DoubleJ (bff v10)
3700001 3800000 Claimed yipdw
3.8M+1 3.9M Claimed dnova
124138261 Pool Check for an update to the script!

Please try and claim 100,000 id blocks at this time, or more if your system has adequate space.

Known issues

Affected Issue Resolution
User IDs < 340000 Suspect blog content Jason will run a blog-check at the end
Profiles retrieved with bff.sh < v8 Missing blog content Redownload affected profiles or wait for blog-check
Profiles retrieved with bff.sh < v9 Missing images Redownload affected profiles

More on the missing image problem

We've just discovered the versions of bff.sh that we've been using don't grab the right things on some systems. Specifically, we know that older versions of grep (i.e. 2.5.4) don't match some urls as intended. To test whether your files have been downloading correctly, run ./bff.sh 115288. If you end up with one .jpg instead of 8 (here is what you should end up with), you need to upgrade your version of bff.sh before continuing. The current version solves the issue. We're figuring out what to do about the already-downloaded stuff.

Tools

friendster-scrape-profile

Script to download a Friendster profile: download it, or clone the git repository.

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

  • the main profile page (profiles.friendster.com/$PROFILE_ID)
  • the user's profile image from that page
  • the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
  • each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
  • the original photos from each album
  • the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
  • the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
  • the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 to 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

$ for i in {#..#}; do bash bff.sh $i; done

Note: If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

If you have vi (the editor) you can also open the file in vi, type <ESC>:set ff=unix followed by <ESC>:wq Then run the command in step 7 again.

Advanced: multiple instances

Now you might notice it's relatively slow. My average is 115 profiles per hour. But the bottleneck is mainly network requests, so running multiple instances can increase your download speed nearly linearly.
BUT we're not sure whether it's safe to use the same cookies.txt file for all the instances (which it will do by default). Luckily you can easily avoid this using an extra optional parameter of bff.sh. Just add the name of the cookie file you want it to create and use right after the profile ID, for instance: "bff.sh 4012089 cookie3.txt". Use a different cookie file for each instance. The full, modified command would then be (replacing the #'s with your range or the cookie number, where applicable):

$ for i in {#..#}; do bash bff.sh $i cookie#.txt; done

A more automated solution is also available from https://github.com/db48x/friendster-scrape. The snook.sh script in this repository takes the start and end of a range and a number of download threads to run and launches that many instances of bff.sh at once. It automatically logs the output to individual log files and creates separate cookies files for them. This script was originally written by underscore; you may have his link to pastebin on the irc channel. I've fixed several bugs, including one very serious one. If you used the version from pastebin, you'll need to start over because it downloaded the wrong profiles (keep what you downloaded, it'll merely overlap with someone else.)

Another option is this perl script which does a similar job. It's not thorougly tested yet, but it's pretty simple. It takes the starting ID, the number of IDs per process, the number of processes, then creates a shell script which launches them. It has the bonus of being able to be stopped by using $ touch STOP, and it logs every finished ID from every instance to one file for monitoring. This script will give a quick summary of that file to monitor the processes' progress. (And with touch STOP and the summary file, that means easy management over SSH! Woo!)

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'http://profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'http://www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'http://www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Unknown.


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · Vkontakte · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ