Friendster

From Archiveteam
Jump to navigation Jump to search
Friendster
Friendster - Home 1304442914645.png
URL http://www.friendster.com/[IAWcite.todayMemWeb]
Status Closing
Archiving status In progress...
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

Friendster is an early social networking site which announced on April 25th, 2011 that most of the user-generated content on the site would be deleted on May 31st, 2011. It's estimated that Friendster has over 115 million registered users.

How to help

Scrape profiles

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start End Status Size (Uncompressed) Claimant
1 999 Uploaded 55MB closure
1000 1999 Uploaded 283MB alard
2000 2999 Uploaded 473MB DoubleJ
3000 3999 Done 234MB Teaspoon
4000 4999 Uploaded 183MB Paradoks
5000 5999 Uploaded 202MB robbiet48/Robbie Trencheny (Amsterdam)
6000 9999 Uploaded 1.1gb Sketchcow/Jason Scott
10000 29999 Claimed Sketchcow/Jason Scott
30000 31999 Uploaded 485mb Sketchcow/Jason Scott
32000 32999 Done 201MB Paradoks
33000 33999 Uploaded 241mb closure
34000 100000 Uploaded unknown (20+ gb?) closure
100000 101000 Done 205.6 MB xlene
101001 102000 Uploaded 232MB robbiet48/Robbie Trencheny (Florida)
102001 103000 Uploaded 241MB robbiet48/Robbie Trencheny (Amsterdam)
103001 104000 Done 202MB yipdw
104001 105000 Done 231MB Coderjoe
105001 114999 Done 2.1GB Paradoks
115000 116999 Done 468MB yipdw
117000 119999 Done 720MB Coderjoe
120000 130000 Claimed robbiet48/Robbie Trencheny (Florida)
130000 140000 Claimed robbiet48/Robbie Trencheny (Amsterdam)
140001 160000 Claimed yipdw
160001 180000 Claimed jch
180001 200000 Claimed yipdw
200001 220000 Claimed Coderjoe
220001 230000 Claimed xlene
230001 240000 Done 4.4GB alard
240001 250000 Claimed Teaspoon
250001 260000 Claimed robbiet48/Robbie Trencheny (Newark)
260001 270000 Claimed robbiet48/Robbie Trencheny (Fremont 1)
270001 280000 Claimed robbiet48/Robbie Trencheny (Fremont 2)
280001 290000 Claimed DoubleJ (updated script started at 281783)
290001 300000 Claimed dnova
310001 320000 Claimed Coderjoe
320001 330000 Claimed robbiet48/Robbie Trencheny (Oakland)
330000 340000 Done closure
340000 400000 Claimed Sketchcow/Jason Scott
400001 500000 Claimed DoubleJ
500000 600000 Claimed closure
600001 700000 Claimed no2pencil
700001 800000 Claimed proub/Paul Roub
800001 900000 Claimed proub/Paul Roub
900001 1000000 Claimed Soult
1000001 1100000 Claimed Avram
1100001 1200000 Claimed Paradoks
1200001 1300000 Claimed db48x
1300000 1400000 Claimed closure (penguin)
1400001 1500000 Claimed alard
1500001 1600000 Claimed ksh/omglolbah
1600001 1700000 Claimed ksh/omglolbah
1700001 1800000 Claimed ksh/omglolbah
1800001 1900000 Claimed ksh/omglolbah
1900001 2000000 Claimed ksh/omglolbah
2000001 2100000 Claimed ksh/omglolbah
2100001 2200000 Claimed Teaspoon
2200001 2300000 Claimed Darkstar
2300001 2400000 Claimed Darkstar
2400001 2500000 Claimed underscor (snookie)
124328261 Pool Check for an update to the script!

Please try and claim 100,000 id blocks at this time, or more if your system has adequate space.

(Side note: User IDs below 340000 are suspect for blogs. Jason will run a final blog-check at the end.)

Tools

friendster-scrape-profile

Script to download a Friendster profile download or Github repository

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

  • the main profile page (profiles.friendster.com/$PROFILE_ID)
  • the user's profile image from that page
  • the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
  • each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
  • the original photos from each album
  • the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
  • the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
  • the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 to 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

for i in {#..#}; do bash bff.sh $i; done

Note: If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

If you have vi (the editor) you can also open the file in vi, type <ESC>:set ff=unix followed by <ESC>:wq Then run the command in step 7 again.

Paralellizing the download

If you, like me, just realized that at a rate of ~10-20 seconds per profile it will take you over 10 days to grab 100000 profiles, you might want to give this quick and dirty script a try:

#!/bin/bash

START=2300001  ## CHANGE THIS FOR YOUR RANGE!
END=2400000    ## CHANGE THIS ALSO!!!

id=$START

while test $id -lt $END; do
  ./bff.sh $id >/dev/null 2>/dev/null &
  sleep 4;
  id=$((($id + 1)))
done; 

If you run this script, it will launch the bff.sh script in the background for an ID, wait 4 seconds, and launch the next script. These scripts will pile up in the background and download a profile each, and then simply terminate. You will get about (avg. download time per profile)/(delay) parallel processes. Don't set the delay too low or you will starve your downlink. 4s works good for me, I had around 10-20 seconds for each profile, and I get around 10 parallel processes (more than the expected 3-5 because some of the profiles are bigger than expected)

the good thing is that, as soon as you recognize that you get too many processes, you can just kill this script and no new processes will be launched. Just wait for those still running to complete and re-run the script

Use ps -ax | grep bff.sh | grep -v grep to check your processes (or, as I do, use watch "ps -ax | grep bff.sh | grep -v grep" to see it continuously updated)

Interesting idea(s) to add: - have the script check the number of running processes (ps | grep bff | wc -l or something like that) and increase the "sleep"-delay accordingly if the number exceeds some threshold

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'http://profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'http://www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'http://www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Unknown.