Friendster

From Archiveteam
Jump to navigation Jump to search
Friendster
Friendster - Home 1304442914645.png
URL http://www.friendster.com/
Status Offline
Archiving status Saved!
Archiving type Unknown
Project source https://github.com/ArchiveTeam/splinder-grab
IRC channel #archiveteam-bs (on hackint)
(formerly #foreveralone (on EFnet))
Data[how to use] archive-team-friendster

Friendster was an early social networking site. It's estimated that Friendster had over 115 million registered users. Founded in 2002, Friendster allowed the posting of blogs, photos, shoutouts/comments, and "widgets" of varying quality (not dissimilar to Facebook applications). It is considered one of the earlier social media networks (although it has numerous predecessors dating back for years) and distinguished itself by allowing such "rich media" additions to a user's account. After an initially high ranking and rating in the charts, Friendster's slow decline in hotness ensured an ever-growing chance of being deleted, and on April 25th, 2011, Friendster announced that most of the user-generated content on the site would be removed on May 31st, 2011. Literally terabytes of user-generated content was in danger of being wiped out, and Archive Team made it a priority to grab as much of Friendster as possible. A unix-based script (called BFF, or Best Friends Forever) was created and Archive Team asked for anyone with unix and 100gb of disk space to get involved in the project.

Jonathan Abrams, the original co-founder of Friendster, has wiped his hands of the whole situation, and is mostly frustrated with Friendster's past. [1]

Because Friendster was based on numeric IDs (as opposed to usernames), it was possible to assign "chunks" to Archive Team volunteers. The tools below were used to save Friendster.

There's a side project downloading a Friendster dataset.

DNS change

On Monday, June 27, Friendster switched DNS servers, pointing at their new site. However, the old site and data remain available on the old servers, if you know where to look.

NOTE: It is strongly recommended that you use a local caching DNS server, such as dnscache, dnsmasq, or bind. This reduces the DNS load on your internet connection, allows DNS lookups to resolve faster, and reduces load on the remote server.

dnscache

If you're using the dnscache server from the djbdns package, you can do the following to forward your Friendster-related requests (assuming your dnscache configuration is in /etc/dnscache):

  1. echo 50.17.127.246 > /etc/dnscache/root/servers/friendster.com
  2. echo 50.17.127.246 > /etc/dnscache/root/servers/friendster.com.cdngc.net
  3. svc -t /etc/dnscache

Do a lookup of friendster.com. You should get 209.11.168.113.

dnsmasq

You can tell dnsmasq to forward requests for the Friendster domains to a different server. dnsmasq will also cache results for a time. the default cache size is 150 names.

  1. Find your dnsmasq configuration.
  2. Add the following options:
    server=/friendster.com/50.17.127.246
    server=/friendster.com.cdngc.net/50.17.127.246
  3. Restart dnsmasq.
  4. do a lookup for friendster.com. you should get 209.11.168.113.

bind

If you're using bind for your DNS needs, you can add the following to your options in order to forward your Friendster-related requests to a server that is still serving the old data:

zone "friendster.com" {
   type forward;
   forwarders { 50.17.127.246; };
};

zone "friendster.com.cdngc.net" {
   type forward;
   forwarders { 50.17.127.246; };
};

Then reload/restart bind. Do a lookup of friendster.com and you should get 209.11.168.113.

hacky simple way

NOTE: This is NOT recommended. It will forward ALL of your DNS lookups to this server, for EVERY request. (linux does not cache results on the local machine by default.)

Add "nameserver 50.17.127.246" to the top of your /etc/resolv.conf file. This will send all lookups to that server first. This server does so recursive requests as well, so you could use it directly if you wanted. (it would potentially slow down all name lookups, however). The better way is to do one of the above. (by default, linux does not cache dns results on the local machine. you may want to install dnscache, change the root/servers/@ file to list your ISP dns servers (or your own local server), and point resolv.conf at 127.0.0.1).

Be aware that with this hacky method, the change could be overwritten the next time your DHCP updates. (You might be able to add the line to a new file named "/etc/resolve.conf.head" to get around this. You might also be able to configure your DHCP client to ignore the servers it got from the DHCP, or place another server before or after it. Another option, in Linuxes that support it, is to "sudo chattr +i /etc/resolv.conf")

Do a lookup of friendster.com. You should get 209.11.168.113.

Tools

friendster-scrape-profile

Script to download a Friendster profile: download it, or clone the git repository.

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

  • the main profile page (profiles.friendster.com/$PROFILE_ID)
  • the user's profile image from that page
  • the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
  • each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
  • the original photos from each album
  • the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
  • the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
  • the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 and 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

$ for i in {#..#}; do bash bff.sh $i; done

or even better

$ ./bff-thread.sh # #

which will allow you to stop at any time by touching the STOP file.

Advanced: multiple instances

Requirements

Now you might notice it's relatively slow. My average is 115 profiles per hour. The bottleneck is mainly network requests, so running multiple instances can increase your download speed nearly linearly. BUT we're not sure whether it's safe to use the same cookies.txt file for all the instances (which it will do by default). Luckily you can easily avoid this using an extra optional parameter of bff.sh. Just add the name of the cookie file you want it to create and use right after the profile ID, for instance: "bff.sh 4012089 cookie3.txt". Use a different cookie file for each instance.

Manually

The full, modified command would then be (replacing the #'s with your range or the cookie number, where applicable):

$ for i in {#..#}; do bash bff.sh $i cookie#.txt; done

chunky.sh

This is the latest and most sophisticated way to automate this is to run chunky.sh. It breaks the range up into chunks of a thousand profiles, and runs as many of these chunks concurrently as you request. This means that if some chunks contain smaller profiles and therefore download more quickly you don't end up with fewer concurrent downloads than you wanted.

$ ./chunky.sh <start> <end> <threads>

Multiple Instances of chunky.sh

In order to always be downloading at maximum capacity, we're experimenting with an updated chunky.sh that is aware of all BFF download processes on the machine, not just its own. That means that you can start a new range of profiles and the new chunky.sh will patiently wait until it sees an open download slot to take. It hasn't seen a whole lot of testing yet, so use at your own risk and report any problems or possible improvements in #forerveralone. Syntax is the same as the original chunky.sh. View it here or download it here.

snook.sh

The original automated solution was snook.sh. This script takes the start and end of a range and a number of download threads to run and launches that many instances of bff.sh at once. It automatically logs the output to individual log files and creates separate cookies files for them. This script was originally written by underscore; you may have his link to pastebin on the irc channel. I've fixed several bugs, including one very serious one. If you used the version from pastebin, you'll need to start over because it downloaded the wrong profiles (keep what you downloaded, it'll merely overlap with someone else.) If you need to stop the downloads cleanly, simply $ touch STOP.

invoker.pl and summary.pl

Another option is this perl script which does a similar job. It's not thoroughly tested yet, but it's pretty simple. It takes the starting ID, the number of IDs per process, the number of processes, then creates a shell script which launches them. It has the bonus of being able to be stopped by using $ touch STOP, and it logs every finished ID from every instance to one file for monitoring. This script will give a quick summary of that file to monitor the processes' progress. (And with touch STOP and the summary file, that means easy management over SSH! Woo!)

XML friend lists

Also on the wiki: a script that uses the Friendster API to download friends lists. This has the advantage that you can get the ids of all friends of a user as one XML file, which is a lot faster than the bff method. See getfriends.sh on Github.


Troubleshooting

If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'http://profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'http://www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'http://www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Blogs are hosted by a wordpress install, typically at (somename).blog.friendster.com for the actual blog pages, with images hosted on (somename).blogs.friendster.com, where that name is the same, and picked by the user.

Groups

Friendster groups (only visible when logged in) have a profile picture, a list of members, photos, discussions (a forum) and announcements.

The group ids range from 1 to 3253050.

Forums

There are general Friendster forums on http://www.friendster.com/forums/. It is not clear whether they will remain open or disappear as well. (Note: I downloaded them, Alard.)

Range Signup Sheet

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start End Status Size (Uncompressed) Claimant
1 999 Uploaded 55MB closure
1,000 1,999 Uploaded 283MB alard
2,000 2,999 Uploaded 473MB DoubleJ
3,000 3,999 Downloaded 234MB Teaspoon
4,000 4,999 Uploaded 183MB Paradoks
5,000 5,999 Uploaded 202MB robbiet48/Robbie Trencheny (Amsterdam)
6,000 9,999 Uploaded 1.1gb Sketchcow/Jason Scott
10,000 29,999 Uploaded 5.1gb Sketchcow/Jason Scott
30,000 31,999 Uploaded 485mb Sketchcow/Jason Scott
32,000 32,999 Uploaded 201MB Paradoks
33,000 33,999 Uploaded 241mb closure
34,000 100,000 Uploaded unknown (20+ gb?) closure
100,000 101,000 Downloaded 205.6 MB xlene
101,001 102,000 Uploaded 232MB robbiet48/Robbie Trencheny (Florida)
102,001 103,000 Uploaded 241MB robbiet48/Robbie Trencheny (Amsterdam)
103,001 104,000 Uploaded yipdw
104,001 105,000 Downloaded 252MB Coderjoe
105,001 114,999 Uploaded 2.1GB Paradoks
115,000 116,999 Uploaded yipdw
117,000 119,999 Downloaded 815MB Coderjoe
120,000 130,000 Uploaded 2.3GB robbiet48/Robbie Trencheny (Florida)
130,000 140,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.130000-140000.tar robbiet48/Robbie Trencheny (Florida) (Reclaimed by Underscor 15:24, 19 June 2011 (UTC))
140,001 160,000 Uploaded yipdw
160,001 180,000 Downloaded 2.4GB jch
180,001 200,000 Uploaded yipdw
200,001 220,000 Downloaded 8.4GB Coderjoe
220,001 230,000 Uploaded xlene (Reclaimed by alard, 19 June 2011)
230,001 240,000 Uploaded 4.4GB alard
240,001 250,000 Downloaded Teaspoon
250,001 260,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.250001-260000.tar robbiet48/Robbie Trencheny (Newark) (Reclaimed by Underscor 21:35, 19 June 2011 (UTC))
260,001 270,000 Uploaded 4.0GB robbiet48/Robbie Trencheny (Fremont 1)
270,001 280,000 Uploaded 3.2GB robbiet48/Robbie Trencheny (Fremont 2)
280,001 290,000 Uploaded 3.8GB DoubleJ
290,001 300,000 Uploaded 3.9GB dnova
310,001 320,000 Downloaded 5.1GB Coderjoe
320,001 330,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.320001-330000.tar robbiet48/Robbie Trencheny (Oakland) (Reclaimed by Underscor 23:20, 19 June 2011 (UTC))
330,000 340,000 Uploaded closure
340,000 400,000 Uploaded 25gb Sketchcow/Jason Scott
400,001 500,000 Uploaded 40 GB DoubleJ
500,000 600,000 Downloaded 37 GB closure (penguin)
600,001 700,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.600001-700000.tar no2pencil (Reclaimed by Underscor 12:50, 20 June 2011 (UTC))
700,001 800,000 Uploaded 36GB proub/Paul Roub
800,001 900,000 Uploaded 39GB proub/Paul Roub
900,001 1,000,000 Uploaded (gv7@blindtiger) 36GB Soult
1,000,001 1,100,000 Downloaded by DoubleJ 32 GB Avram (reclaimed by DoubleJ 6/21 3PM EDT)
1,100,001 1,200,000 Uploaded 33GB Paradoks
1,200,001 1,300,000 Uploaded 36 GB db48x
1,300,000 1,400,000 Downloaded 36 GB closure (penguin) (reclaimed by db48x, just in case)
1,400,001 1,500,000 Uploaded alard
1,500,001 1,600,000 Downloaded ksh/omglolbah
1,600,001 1,700,000 Downloaded ksh/omglolbah
1,700,001 1,800,000 Downloaded ksh/omglolbah
1,800,001 1,900,000 Downloaded ksh/omglolbah
1,900,001 2,000,000 Downloaded ksh/omglolbah
2,000,001 2,100,000 Downloaded ksh/omglolbah
2,100,001 2,200,000 Downloaded 65 GB Teaspoon
2,200,001 2,300,000 Uploaded 50gb compressed Darkstar
2,300,001 2,400,000 Uploaded 70gb compressed Darkstar
2,400,001 2,500,000 Downloaded underscor (snookie)
2,500,001 2,600,000 Downloaded by underscor Bardicer (Reclaimed by Underscor 04:02, 22 June 2011 (UTC))
2,600,001 2,700,000 Downloaded by underscor Robbie Trencheny (Amsterdam) (Reclaimed by Underscor 18:44, 24 June 2011 (UTC))
2,700,001 2,800,000 Downloaded by underscor Robbie Trencheny (Fremont 2) (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
2,800,001 2,900,000 Downloaded 139GB Coderjoe (system1)
2,900,001 3,000,000 Downloaded 154GB Coderjoe (system2)
3,000,001 3,100,000 Uploaded 78GB Qwerty0
3,100,001 3,600,000 Claimed Jason Scott/Sketchcow
3,600,001 3,700,000 Downloaded 202 GB DoubleJ
3,700,001 3,800,000 Uploaded yipdw
3,800,001 3,900,000 Uploaded oli
3,900,001 4,000,000 Claimed Jason Scott/Sketchcow
3,985,001 4,000,000 Downloaded 32GB Coderjoe (per Sketchcow's request)
4,000,001 4,100,000 Downloaded by DoubleJ primus102 (reclaimed by DoubleJ 6/22 3:15PM EDT)
4,100,001 4,200,000 Downloaded Zebranky
4,200,001 4,300,000 Claimed Zebranky (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
4,300,001 4,399,999 Uploaded 255GB (196GB compressed) db48x
4,400,000 4,599,999 Downloaded 364GB (480 uncompressed) Jade Falcon
4,600,000 4,799,999 Uploaded (gv7@blindtiger) Soult
4,800,000 4,809,999 Uploaded alard
4,810,000 4,899,999 Uploaded oli
4,900,000 4,999,999 Uploaded 216GB (160GB compressed) db48x
5,000,000 5,099,999 Downloaded by underscor jch (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
5,100,000 5,199,999 Downloading (20%) hydruh
5,200,000 5,299,999 Uploaded chris_k
5,300,000 5,349,000 Uploaded 177~GB ersi
5,349,001 5,359,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/data_5349001_5359000.tar.gz 13GB Underscor 03:25, 22 May 2011 (UTC)
5,359,001 5,360,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/data_5359000_5360000.tar.bz2 Underscor 03:25, 22 May 2011 (UTC)
5,360,001 5,370,000 Uploaded http://ia700601.us.archive.org/5/incoming/gv/data_5360001_5370000.tar 11GB Underscor 03:25, 22 May 2011 (UTC)
5,370,001 5,470,000 Downloaded Underscor 03:25, 22 May 2011 (UTC)
5,470,001 5,570,000 Downloaded Underscor 03:25, 22 May 2011 (UTC)
5,570,001 5,670,000 Downloaded Underscor 03:25, 22 May 2011 (UTC)
5,670,001 6,349,999 Downloading jeremydouglass
6,350,000 6,449,999 Uploaded 212~GB Paradoks
6,450,000 6,550,000 Uploaded yipdw
6,550,001 6,700,000 Uploaded oli
6,700,000 6,800,000 Claimed closure (penguin)
6,800,001 6,900,000 Uploaded alard
6,900,001 7,000,000 Uploaded oli
7,000,001 7,100,000 Uploaded seanp2k (likwid/@ip2k on twitter)
7,100,001 7,150,000 Downloaded oli
7,150,001 7,250,001 Downloaded 204 GB (160G compressed) dashcloud
7,250,002 7,299,999 Downloaded db48x
7,300,000 7,399,999 Uploaded 171 GB DoubleJ
7,400,000 7,499,999 Downloaded 138GB compressed dsquared
7,500,000 7,599,999 Uploaded oli
7,600,000 7,699,999 Uploaded oli
7,700,000 7,799,999 Downloaded seanp2k (likwid/@ip2k on twitter)
7,800,000 7,899,999 Uploaded seanp2k (likwid/@ip2k on twitter)
7,900,000 7,999,999 Compressing seanp2k (likwid/@ip2k on twitter)
8,000,000 8,099,999 Downloaded by underscor primus102 (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
8,100,000 8,199,999 Uploaded alard
8,200,000 8,299,999 Downloaded 190GB / 145GB tar.gz jeremydouglass
8,300,000 8,399,999 Downloaded 192GB Beardicus
8,400,000 8,449,999 Compressed 100GB Shadyman (Yes, 50k IDs)
8,450,000 8,599,999 Uploading aristotle
8,600,000 8,699,999 Uploaded 131GB chris_k
8,700,000 8,715,999 Uploaded alard (redownloading vertevero's range)
8,716,000 8,999,999 Uploaded (possible errors, email aggroskater AT gmail DOT com if reup needed) aggroskater
9,000,000 9,035,999 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.9000000-9035999.tar chris_k (Reclaimed by Underscor 04:23, 26 June 2011 (UTC))
9,036,000 9,099,999 Downloaded 95 GB db48x
9,100,000 9,199,999 Downloaded 139 GB DoubleJ
9,200,000 9,299,999 Uploaded chris_k
9,300,000 9,399,999 Downloaded db48x
9,400,000 9,499,999 Downloaded 161 GB db48x
9,500,000 9,529,999 Uploaded alard
9,530,000 9,599,999 Downloaded 115G Coderjoe (realigning)
9,600,000 9,699,999 Uploaded aristotle
9,700,000 9,799,999 Downloaded DoubleJ
9,800,000 9,899,999 Uploaded 144G aristotle
9,900,000 9,999,999 Uploaded 149G aristotle
10,000,000 10,050,000 Uploaded yipdw (50k intentional)
10,050,001 10,100,000 Uploaded 96G dinomite
10,100,001 10,199,999 Uploaded chris_k
10,200,000 10,300,000 Downloaded 204GB Coderjoe (yes, 100k1)
10,300,001 10,399,999 Uploaded 179 G (141 G compressed) dashcloud
10,400,001 10,499,999 Claimed Lambda_Driver
10,500,000 10,599,999 Uploaded 199G (155G compressed) dinomite
10,600,000 10,699,999 Uploaded dinomite
10,700,000 10,799,999 Downloaded 196 GB DoubleJ
10,800,000 10,849,999 Downloaded Shadyman
10,850,000 10,899,999 Downloaded Underscor 18:56, 24 May 2011 (UTC)
10,900,000 10,999,999 Uploaded chris_k
11,000,000 11,049,999 Uploaded alard
11,050,000 11,199,999 Downloading Cameron_D
11,200,000 11,249,999 Downloading Underscor 01:05, 31 May 2011 (UTC)
11,250,000 11,399,999 Uploaded chris_k
11,400,000 11,449,999 Uploaded yipdw
11,450,000 11,499,999 Uploaded chris_k
11,500,000 11,599,999 Uploaded alard
11,600,000 11,699,999 Uploaded chris_k
11,700,000 11,714,999 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.11700000-11714999.tar Underscor 22:12, 17 June 2011 (UTC)
11,715,000 11,799,999 Downloaded 161GB Coderjoe (Realigning)
11,800,000 11,899,999 Uploaded chris_k
11,900,000 11,999,999 Downloaded Beardicus
12,000,000 12,099,999 Uploaded chris_k
12,100,000 12,199,999 Downloaded by dsquared dsquared
12,200,000 12,849,999 Downloaded ~270GB Uncompressed Wyatt
12,850,000 12,899,999 Claimed Perfinion
12,900,000 12,949,999 Downloaded DoubleJ
12,950,000 12,999,999 Downloaded DoubleJ
13,000,000 13,099,999 Uploaded chris_k
13,100,000 13,199,999 Uploaded chris_k
13,200,000 13,299,999 Uploaded 195G Uncompressed, 124G .gz chris_k
13,300,000 13,399,999 Downloading Underscor 00:39, 10 June 2011 (UTC)
13,400,000 13,499,999 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.13400000-13499999.tar 159G Uncompressed chris_k (Accidentally double-done by underscor... oops)
13,500,000 13,599,999 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.13500000-13599999.tar Underscor 19:18, 10 June 2011 (UTC)
13,600,000 13,699,999 Downloaded 173GB Coderjoe (sys-g)
13,700,000 13,799,999 Downloaded 190GB Coderjoe (sys-r)
13,800,000 13,899,999 Downloaded 152GB db48x
13,900,000 13,999,999 Downloading Beardicus
14,000,000 14,099,999 Downloaded 170GB Coderjoe (sys-g)
14,100,000 14,199,999 Uploaded chris_k
14,200,000 14,299,999 Uploaded chris_k
14,300,000 14,399,999 Uploaded ~150GB (108GB compessed) Perfinion
14,400,000 14,499,999 Downloaded 169GB Coderjoe (sys-g)
14,500,000 14,599,999 Uploaded chris_k
14,600,000 14,699,999 Uploaded http://ia700601.us.archive.org/5/incoming/gv/friendster.14600000-14699999.tar Underscor 16:05, 18 June 2011 (UTC)
14,700,000 14,799,999 Uploaded 59G tar.gz chris_k
14,800,000 14,899,999 Downloaded chris_k
14,900,000 14,999,999 Downloaded 50 GB DoubleJ
15,000,000 15,099,999 Downloaded DoubleJ
15,100,000 15,199,999 Downloaded DoubleJ
15,200,000 15,299,999 Downloading chris_k
15,300,000 15,399,999 Claimed db48x
15,400,000 19,899,999 Unclaimed Available pool
19,900,000 19,999,999 Downloaded DoubleJ
20,000,000 20,099,999 Uploaded dinomite
20,100,000 20,199,999 Downloaded Beardicus
20,200,000 24,999,999 Unclaimed Available pool
25,000,000 25,099,999 Downloaded DoubleJ
25,100,000 29,899,999 Unclaimed Available pool
29,900,000 29,999,999 Downloaded DoubleJ
30,000,000 30,099,999 Downloaded dsquared
30,100,000 30,199,999 Downloaded DoubleJ
30,200,000 34,999,999 Unclaimed Available pool
35,000,000 35,099,999 Downloaded 45 GB DoubleJ
35,100,000 39,899,999 Unclaimed Available pool
39,900,000 39,999,999 Downloaded DoubleJ
40,000,000 40,099,999 Downloaded 159 GB db48x
40,100,000 40,199,999 Downloaded 153 GB dashcloud
40,200,000 49,899,999 Unclaimed Available pool
49,900,000 49,999,999 Downloaded DoubleJ
50,000,000 50,099,999 Downloaded 135GB jeremydouglass
50,100,000 50,199,999 Downloaded DoubleJ
50,200,000 59,899,999 Unclaimed Available pool
59,900,000 59,999,999 Downloaded DoubleJ
60,000,000 60,099,999 Uploaded 131GB, 91G .gz chris_k
60,100,000 60,199,999 Downloaded DoubleJ
60,200,000 60,299,999 Downloaded seanp2k (likwid/@ip2k on twitter)
60,300,000 60,399,999 Downloading seanp2k (likwid/@ip2k on twitter)
60,400,000 64,999,999 Unclaimed Available pool
65,000,000 65,099,999 Downloaded 103 GB DoubleJ
65,100,000 69,899,999 Unclaimed Available pool
69,900,000 69,999,999 Downloaded DoubleJ
70,000,000 70,099,999 Downloaded 97GB jeremydouglass
70,100,000 70,199,999 Downloaded DoubleJ
70,200,000 79,999,999 Unclaimed Available pool
80,000,000 80,099,999 Uploaded 94GB chris_k
80,100,000 80,199,999 Downloaded 90 GB DoubleJ
80,200,000 89,999,999 Unclaimed Available pool
90,000,000 90,099,999 Uploaded 90GB, 51G .gz chris_k
90,100,000 94,999,999 Unclaimed Available pool
95,000,000 95,099,999 Downloaded 76G zebedee
95,100,000 99,899,999 Unclaimed Available pool
99,900,000 100,000,000 Uploaded aristotle
100,000,000 100,099,999 Uploaded dinomite
100,100,000 109,999,999 Unclaimed Available pool
110,000,000 110,099,999 Uploaded 72GB est. Paradoks
110,100,000 119,999,999 Unclaimed Available pool
120,000,000 120,099,999 Uploaded chris_k
120,100,000 122,000,000 Downloaded oli
122,000,001 124,099,999 Unclaimed Available pool
124,100,000 124,138,261 Downloaded 9GB jeremydouglass
Special Collection Status Size (Uncompressed) Claimant
320 fan profiles (from search) Uploaded alard

We recommend claiming 100k at a time, because that keeps things neat and tidy, both in this table and on your computer. However, it seems that the number of photographs per profile increased quite a bit during the early years, so the later profiles are much larger than the older ones. Feel free to claim a smaller block if it'll help. 100GB should hold about 50,000 ids and only take a couple of days to download.


Proposal: sampling

It is growing increasingly likely that we won't get it all by the 31st. Given that, perhaps we should be sampling new ranges from across the total index, in order to capture a better picture of what Friendster was like across its history.

Here are eleven proposed ranges to start, ranked by priority:

Start End Priority
20,000,000 20,099,999 5
30,000,000 30,099,999 9
40,000,000 40,099,999 3
50,000,000 50,099,999 6
60,000,000 60,099,999 10
70,000,000 70,099,999 2
80,000,000 80,099,999 7
90,000,000 90,099,999 11
100,000,000 100,099,999 4
110,000,000 110,099,999 8
124,100,000 124,138,261 1

...and here they are sorted by priority. If you want to do one of these ranges, you would still add an entry for it in the main table above.

Start End Priority
124,100,000 124,138,261 1 claimed
70,000,000 70,099,999 2 claimed
40,000,000 40,099,999 3 claimed
100,000,000 100,099,999 4 claimed
20,000,000 20,099,999 5 claimed
50,000,000 50,099,999 6 claimed
80,000,000 80,099,999 7 claimed
110,000,000 110,099,999 8 claimed
30,000,000 30,099,999 9 claimed
60,000,000 60,099,999 10 claimed
90,000,000 90,099,999 11 claimed

Proposal: download some groups

It might be interesting to download at least part of the Friendster groups. The bigger groups often have forums, photos and announcements. This table lists the number of groups with 100 members or more. If you are interested, claim a category.

The lists of group ids can be found in this special Github repository. To download a group, you'll need the bgf.sh script, which can be found in the same git repository as bff.sh (you may already have it!).

Note: it seems that Friendster's group browser is now broken, which means that it can't be used to find more large groups from the larger categories. The incomplete id lists for these categories are in the Github repository.

Check for an update to bgf.sh!

CatID Category ID list file Groups Status / Uncompressed size Claimed by
11 Activities ids-100plus-cat-11-PARTIAL.txt 675 Downloaded, uploaded alard
12 Automotive ids-100plus-cat-12.txt 983 Downloaded, 417M .bz2 Cameron_D
13 Business ids-100plus-cat-13.txt 289 Downloaded 302M, uploaded chris_k
14 Career & Jobs ids-100plus-cat-14.txt 457 Downloaded, 463M, uploaded chris_k
15 Cities & Neighborhoods ids-100plus-cat-15.txt 586 Downloaded, 199M .bz2 Cameron_D
16 Companies ids-100plus-cat-16.txt 627 Downloaded, 426M, uploaded chris_k
17 Computers & Internet ids-100plus-cat-17.txt 661 Downloaded, uploaded alard
18 Countries & Regional ids-100plus-cat-18.txt 645 Downloaded, 2.4GB, uploaded chris_k
19 Cultures & Community ids-100plus-cat-19.txt 1425 Downloaded, 2.1GB, uploaded chris_k
20 Entertainment ids-100plus-cat-20.txt 2718 Downloaded, uploaded alard
21 Family & Home ids-100plus-cat-21.txt 554 Downloaded, 214M .bz2 Cameron_D
22 Fan Clubs ids-100plus-cat-22-PARTIAL.txt 4758 Downloaded, 3.2GB, uploaded chris_k
23 Fashion & Beauty ids-100plus-cat-23.txt 1725 Downloaded, 6.6GB, uploaded Paradoks
24 Film & Television ids-100plus-cat-24.txt 954 Downloaded, uploaded alard
25 Food, Drink, & Wine ids-100plus-cat-25.txt 634 Downloaded, 3GB, uploaded Paradoks
26 Games ids-100plus-cat-26-PARTIAL.txt 1583 Downloaded, 3.9GB, uploaded chris_k
27 Gay, Lesbian & Bi ids-100plus-cat-27.txt 850 Uploaded vertevero/claimed by chris_k as a backup
28 Government & Politics ids-100plus-cat-28.txt 279 Downloaded, uploaded alard
29 Health & Fitness ids-100plus-cat-29.txt 281 Downloaded, 49M .bz2 Cameron_D
30 Hobbies & Crafts ids-100plus-cat-30.txt 753 Downloaded, uploaded alard
31 Literature & Arts ids-100plus-cat-31.txt 422 Downloaded, 189M .bz2 Cameron_D
32 Money & Investing ids-100plus-cat-32.txt 102 Downloaded, 869MB, uploaded Paradoks
33 Movies ids-100plus-cat-33.txt 1265 Downloaded, uploaded alard
34 Music ids-100plus-cat-34-PARTIAL.txt 2027 Downloaded, 2.4G, uploaded chris_k
35 Nightlife & Clubs ids-100plus-cat-35.txt 856 Downloaded, 256M .bz2 Cameron_D
36 Non-Profit & Philanthropic ids-100plus-cat-36.txt 126 Downloaded, 325MB, uploaded Paradoks
37 People ids-100plus-cat-37-PARTIAL.txt 1249 Downloaded, 2.7G, uploaded chris_k
38 Pets & Animals ids-100plus-cat-38.txt 449 Downloaded, 342M, uploaded chris_k
39 Professional Organizations ids-100plus-cat-39.txt 1200 Downloaded, uploaded alard
40 Recreation & Sports ids-100plus-cat-40.txt 1130 Downloaded, 1.6G, uploaded chris_k
41 Religion & Beliefs ids-100plus-cat-41.txt 1281 Downloaded, 2.1G, uploaded chris_k
42 Romance & Relationships ids-100plus-cat-42.txt 1020 Downloaded, 932M, uploaded chris_k
43 Schools & Alumni ids-100plus-cat-43-PARTIAL.txt 1020 Downloaded, 2.2G, uploaded chris_k
44 Science & History ids-100plus-cat-44.txt 181 Downloaded, 341MB, uploaded Paradoks
45 Sorority/Fraternities ids-100plus-cat-45.txt 491 Downloaded, 426MB, uploaded chris_k
46 Television ids-100plus-cat-46.txt 908 Downloaded, uploaded alard
47 Travel ids-100plus-cat-47.txt 241 Downloaded, 113M .bz2 Cameron_D
48 Other ids-100plus-cat-48-PARTIAL.txt 1356 Downloaded, 1.5g, Uploaded chris_k
49 Events ids-100plus-cat-49.txt 330 Downloaded, 969MB, uploaded Paradoks

Known issues

Affected Issue Resolution
User IDs < 340000 Suspect blog content Jason will run a blog-check at the end
Profiles retrieved with bff.sh < v8 Missing blog content Redownload affected profiles or wait for blog-check
Profiles retrieved with bff.sh < v9 Missing images Redownload affected profiles
Profiles with more than one shoutout page, retrieved with bff.sh < v12 Only first page of shoutoutstream Redownload profiles that have a file shoutout_2.html
Groups retrieved with bgf.sh < v4 Missing bulletins Run fix-bgf-bulletins.sh to redownload

Running on Mac OS X

Summary: To run on Mac OS X 10.5 (Leopard), 10.6 (Snow Leopard), or 10.6 Server, you need to install wget, bash 4.0+, and a more recent expr.

Description: In MacPorts, this can be done through installing packages wget, bash, and coreutils (for gexpr), then changing the top lines in all .sh files from the ArchiveTeam-friendster-scrape git package to !#/opt/local/bin/bash, and replacing all instances of 'expr' with 'gexpr'. Then run chunky.sh as normal on a range and declare victory.

Problem Details: All this is done to work around these three problems:

  1. bff.sh requires wget
    • ...which is not installed by default
  2. bff.sh requires a more recent version of expr
  3. chunky.sh requires a more recent version of bash (4.0+)
    • ...to support the shell builtin "declare -A" (associative arrays)

Solution Details:

  1. Install MacPorts (requires Developer Tools). You may also use Homebrew or Fink.
  2. Install new versions of wget, expr, and bash. In MacPorts
    • sudo port install wget
    • sudo port install coreutils
    • sudo port install bash
  3. In MacPorts, the new bash is installed at
    • /opt/local/bin/bash
    • ...so change the first line of each .sh file from:
    • !#/bin/bash ...to:
    • !#/opt/local/bin/bash
  4. In MacPorts, the new expr is called "gexpr", so search and replace every expr-->gexpr
    • (lines 167 and 231 in bff.sh, in the current scripts)

More:

More on the missing image problem

We've just discovered the versions of bff.sh that we've been using don't grab the right things on some systems. Specifically, we know that older versions of grep (i.e. 2.5.4) don't match some urls as intended. To test whether your files have been downloading correctly, run ./bff.sh 115288. If you end up with one .jpg instead of 8 (here is what you should end up with), you need to upgrade your version of bff.sh before continuing. The current version solves the issue. We're figuring out what to do about the already-downloaded stuff.

More on the shoutout page problem

There was an error in the section downloading the shoutoutstream pages, (bff.sh versions < 12). For profiles with more than one shoutoutstream page, the first page was downloaded several times. shoutout_1.html, shoutout_2.html etc. all contained the first page of messages. This problem was fixed in the version 12 of the script.

This only affects profiles with more than one shoutout page. This is a small percentage of the profiles (7 out of the 50,000 profiles in my collection). They can be found by looking for shoutout_2.html. Remove the profiles that have this file and run the script again.

Blogs with bad links

Some blogs have bad links that expand into an infinite tree. The latest version of bff.sh ameliorates this problem by limiting recursion depth to 20, but in some cases that can still be too much.

These profile IDs are known to have blogs that cause problems:

ID Example of offending URL
319,533 exquisitelle.blog.friendster.com/category/uncategorized/<object width=/<object width=/...
488,742 oxidation-hani.blog.friendster.com/category/uncategorized/page/category/uncategorized/page/...
2,969,345 mercurian.blog.friendster.com/category/life-or-something-like-it/<center> <a href=/...
3,007,822 iz-freedom.blog.friendster.com/category/uncategorized/;/;/...
3,035,078 khelay-angela.blog.friendster.com/category/music/<div style=/<div style=/<div style=/...
3,764,079 luzie53.blog.friendster.com/&tbnh=106&tbnw=142&hl=tl&start=4&prev=/images%3Fq%3Dsad%2Bangel%2Bpictures%26svnum%3D10%26hl%3Dtl%26lr%3D%26sa%3DN/...
3,774,275 msiabeckham.blog.friendster.com/category/uncategorized/\/\/\/\/\/\/\/\/\/page/2/\/page/2...
3,789,069 chrisna.blog.friendster.com/&tbnh=110&tbnw=80&hl=en&start=20&prev=/images?q=friendship&hl=en&lr=&sa=G/images?q=friendship&hl=en&lr=&sa=G/...
5,686,980 output file names look like: 5/6/8/5686980/blog/myztikal-princessa.blog.friendster.com/category/current-affairs/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=
5,716,605
5,726,907 output file names look like: data/5/7/2/5726907/blog/zacharyderwoods.blog.friendster.com/category/uncategorized/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/&tbnh=87&tbnw=105&hl=en&start=2&prev=/images?q=Bangkok&svnum=10&hl=en&lr=&sa=G/index.html.orig
6,501,473 irisruby.blog.friendster.com/category/music/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<a href=/<object classid=/...
6,803,753 pauloge10.blog.friendster.com/category/uncategorized/object_width=%2F%3Ca+href%3D%2F%3Cobject+width%3D%2F%3Ca+href%3D%2F%3Cobject+width%3D%2F%3Ca+href%3D%2F%3Ca+href%3D%2F%3Ca+href%3D%2F%3Ca+href%3D%2F%3Ca+href%3D%2F%3Ca+href%3D%2F/page/2/
7,124,822 julius-kurimaw.blog.friendster.com/&tbnh=110&tbnw=111&hl=en&start=25&prev=/images?q=+x-japan+&start=20&hl=en&lr=&sa=N/&tbnh=97&tbnw=97&hl=en&start=6&prev=/images?q=first+love+lyrics+by+utada+hikaru&hl=en&lr=&sa=N...
9,537,003 andesmelba.blog.friendster.com/&/page/3/simenon.htm/&/&/page/3/alesol.htm/&/&/page/2/
13,783,346 simplifiamina.blog.friendster.com/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/itachi.htm/itachi.htm/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/itachi.htm/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/itachi.htm/itachi.htm/itachi.htm/&tbnh=71&tbnw=120&hl=en&start=2&prev=/images?q=sasuke+and+sakura&svnum=10&hl=en&lr=&safe=off&sa=N/itachi.htm.html
14,068,868 evelyn01.blog.friendster.com/category/uncategorized/&tbnh=88&tbnw=91&hl=tl&start=2&prev=/images%3Fq%3Dhello%2Bkitty%26hl%3Dtl%26lr%3D%26sa%3DN/index.cfm/index.cfm/index.cfm/index.cfm/&tbnh=88&tbnw=91&hl=tl&start=2&prev=/images%3Fq%3Dhello%2Bkitty%26hl%3Dtl%26lr%3D%26sa%3DN/index.cfm/index.cfm/&tbnh=88&tbnw=91&hl=tl&start=2&prev=/images%3Fq%3Dhello%2Bkitty%26hl%3Dtl%26lr%3D%26sa%3DN/&tbnh=88&tbnw=91&hl=tl&start=2&prev=/images%3Fq%3Dhello%2Bkitty%26hl%3Dtl%26lr%3D%26sa%3DN/
14,482,038 cyfa.blog.friendster.com/category/film/&tbnh=91&tbnw=138&hl=id&start=25&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=20&svnum=10&hl=id&lr=&sa=N/&tbnh=120&tbnw=150&hl=id&start=52&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=40&svnum=10&hl=id&lr=&sa=N/&tbnh=120&tbnw=150&hl=id&start=52&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=40&svnum=10&hl=id&lr=&sa=N/&tbnh=120&tbnw=150&hl=id&start=52&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=40&svnum=10&hl=id&lr=&sa=N/&tbnh=120&tbnw=150&hl=id&start=52&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=40&svnum=10&hl=id&lr=&sa=N/&tbnh=120&tbnw=150&hl=id&start=52&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=40&svnum=10&hl=id&lr=&sa=N/&tbnh=91&tbnw=138&hl=id&start=25&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=20&svnum=10&hl=id&lr=&sa=N/&tbnh=89&tbnw=135&hl=id&start=27&prev=/images?q=Harry+Potter+and+The+Goblet+of+Fire&start=20&svnum=10&hl=id&lr=&sa=N.html

If you come across one of these, please add to this list.

Blogs with corrupt images

Some blogs have links to images that just never finish downloading. wget downloads to 99%, then hangs until the server closes the connection.

ID Example of offending URL
1421002 http://diverjun23.blog.friendster.com/files/sany0901.jpg
2848374 http://aspen.blogs.friendster.com/photos/uncategorized/test.gif
7934375 http://photos-p.friendster.com/photos/76/94/7934967/1_256336337.jpg

If you come across another corrupt image, please add it to the list.

CEO's "Friendster re-launching" message

Date: Fri, 1 Jul 2011 06:28:28 -0700 (PDT)

Dear fellow Friendster members,
As many of you may know, Friendster announced that it is re-launching itself as
a social gaming portal and launched a beta version of the new Friendster a
couple of weeks ago. The beta version was well received. I am pleased to
announce that the new Friendster is going live thereby enabling all our users to
login to the new Friendster using your existing Friendster username and
password.
Friendster has touched the lives of many. Since MOL, the company I founded
acquired Friendster in early last year; many people have come up to me to tell
me how Friendster has changed their lives. Many have told me that they have
found their life partners over Friendster. Just last week, a successful Internet
entrepreneur in Singapore told me that her success was triggered by promoting
her business on Friendster. Friendster pioneered social networking and ignited
the social media industry that has created billion dollar companies such as
Facebook and Twitter, companies that may not have existed in their present form
if not for Friendster's early innovation.
Today, Friendster is in a unique position to take advantage on the growth of
social gaming. Through its relationship with MOL, which has a 10 year history in
working with gaming companies, Friendster has both the experience and track
record to make innovations in this space.
Today, as Friendster reinvents itself as a social gaming destination that
enables its users to create multiple avatars, play games and enjoy rewards; I
hope that all of you will wish us luck and continue to support us in our new
reincarnation. The new Friendster is not perfect and we will continue to add new
games and features such as localization and rewards over the next few months.
Our team is working hard on adding these features and welcomes your suggestions
and comments on how we can better serve your needs as a social gaming and
entertainment destination.
I would like to take this opportunity to thank all of you for your support and
hope that all of you will enjoy the new Friendster as Friendster continues to
innovate to serve and entertain you better.
Yours truly,
Ganesh Kumar Bangah
Chief Executive Officer
ceo@friendster.com

New Friendster answer about old data

In the New Friendster's help pages, there is a question about data from the old Friendster. Here is the question and answer:

4.9. Where did the photos, blogs, comments, testimonials and all the other content in my old Friendster profile go?
As part of the reformat of the site, we had to remove some of the content of your profile including the photo albums, blogs and most parts of the profile like the "more about " info, comments and the testimonials.