Friendster

Friendster

URL	http://www.friendster.com/^{[IA•Wcite•.today•MemWeb]}
Status	Closing
Archiving status	In progress...
Archiving type	Unknown
IRC channel	#archiveteam-bs (on hackint)

Friendster is an early social networking site. It's estimated that Friendster has over 115 million registered users. Founded in 2002, Friendster allowed the posting of blogs, photos, shoutouts/comments, and "widgets" of varying quality (not dissimilar to Facebook applications). It is considered one of the earlier social media networks (although it has numerous predecessors dating back for years) and distinguished itself by allowing such "rich media" additions to a user's account. After an initially high ranking and rating in the charts, Friendster's slow decline in hotness ensured an ever-growing chance of being deleted, and on April 25th, 2011, Friendster announced that most of the user-generated content on the site would be removed on May 31st, 2011. Literally terabytes of user-generated content is in danger of being wiped out, and Archive Team has made it a priority to grab as much of Friendster as possible. A unix-based script (called BFF, or Best Friends Forever) has been created and Archive Team is asking for anyone with unix and 100gb of disk space to get involved in the project.

Jonathan Abrams, the original co-founder of Friendster, has wiped his hands of the whole situation, and is mostly frustrated with Friendster's past. [1]

Because Friendster is based on numeric IDs (as opposed to usernames), it is possible to assign "chunks" to Archive Team volunteers. Please read up about the tools below, and if you have an interest in helping, join us at #foreveralone on EFnet and help us save Friendster.

Tools

friendster-scrape-profile

Script to download a Friendster profile: download it, or clone the git repository.

You need a Friendster account to use this script. (Note: if you are creating an account, mailinator email addresses are blocked) Add your login details to a file username.txt and a password.txt and save those in the directory of the download script.

Run with a numeric profile id of a Friendster user: ./friendster-scrape-profile PROFILE_ID

Currently downloads:

the main profile page (profiles.friendster.com/$PROFILE_ID)
the user's profile image from that page
the list of public albums (www.friendster.com/viewalbums.php?uid=$PROFILE_ID)
each of the album pages (www.friendster.com/viewphotos.php?a=$id&uid=$PROFILE_ID)
the original photos from each album
the list of friends (www.friendster.com/fans.php?uid=$PROFILE_ID)
the shoutoutstream (www.friendster.com/shoutoutstream.php?uid=$PROFILE_ID) and the associated comments
the Friendster blog, if any

It does not download any of the widgets.

Downloading one profile takes between 6 to 10 seconds and generates 200-400 kB of data (for normal profiles).

Automating the process

(This is all unix-only; it won't work in Windows.)
1. Create a Friendster account
2. Download the script; name it 'bff.sh'.
3. In the directory that you put the bff.sh, make a username.txt file that has your Friendster e-mail address as the text in it
4. In the directory that you put the bff.sh, make a password.txt file that has your Friendster password as the text in it.
5. Choose your profile range.
6. Edit that section to say what range you'll do.
7. On the command line, type (with your range replacing the '#'s.):

$ for i in {#..#}; do bash bff.sh $i; done

or even better

$ ./bff-thread.sh # #

which will allow you to stop at any time by touching the STOP file.

Advanced: multiple instances

Requirements

Now you might notice it's relatively slow. My average is 115 profiles per hour. The bottleneck is mainly network requests, so running multiple instances can increase your download speed nearly linearly. BUT we're not sure whether it's safe to use the same cookies.txt file for all the instances (which it will do by default). Luckily you can easily avoid this using an extra optional parameter of bff.sh. Just add the name of the cookie file you want it to create and use right after the profile ID, for instance: "bff.sh 4012089 cookie3.txt". Use a different cookie file for each instance.

Manually

The full, modified command would then be (replacing the #'s with your range or the cookie number, where applicable):

$ for i in {#..#}; do bash bff.sh $i cookie#.txt; done

snook.sh

A more automated solution is also available from https://github.com/db48x/friendster-scrape. The snook.sh script in this repository takes the start and end of a range and a number of download threads to run and launches that many instances of bff.sh at once. It automatically logs the output to individual log files and creates separate cookies files for them. This script was originally written by underscore; you may have his link to pastebin on the irc channel. I've fixed several bugs, including one very serious one. If you used the version from pastebin, you'll need to start over because it downloaded the wrong profiles (keep what you downloaded, it'll merely overlap with someone else.) If you need to stop the downloads cleanly, simply $ touch STOP.

invoker.pl and summary.pl

Another option is this perl script which does a similar job. It's not thorougly tested yet, but it's pretty simple. It takes the starting ID, the number of IDs per process, the number of processes, then creates a shell script which launches them. It has the bonus of being able to be stopped by using $ touch STOP, and it logs every finished ID from every instance to one file for monitoring. This script will give a quick summary of that file to monitor the processes' progress. (And with touch STOP and the summary file, that means easy management over SSH! Woo!)

Troubleshooting

If you get an error like bff.sh: line 26: $'\r': command not found, you will need to convert the script to use UNIX-style line endings:

$ dos2unix bff.sh

or if you somehow find yourself without the dos2unix command, do this:

$ sed "s/\r//" bff.sh > bff-fixed.sh
$ mv bff-fixed.sh bff.sh

Site Organization

Content on Friendster seems to be primarily organized by the id number of the users, which were sequentially assigned starting at 1. This will make it fairly easy for wget to scrape the site and for us to break it up into convenient work units. The main components we need to scrape are the profile pages, photo albums and blogs, but there may be others. More research is needed

Profiles

Urls of the form 'http://profiles.friendster.com/<userid>'. Many pictures on these pages are hosted on urls that look like 'http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>.jpg', but these folders aren't browsable directly. Profiles will not be easy to scrape with wget.

Photo Albums

A user's photo albums are at urls that look like 'http://www.friendster.com/viewalbums.php?uid=<userid>' with individual albums at 'http://www.friendster.com/viewphotos.php?a=<album id>&uid=<userid>'. It appears that the individual photo pages use javascript to load the images, so they will be very hard to scrape.

On the individual album pages, the photo thumbnails are stored under similar paths as the main images. i.e. if the album thumb is at http://photos-p.friendster.com/photos/<lk>/<ji>/nnnnnijkl/<imageid>m.jpg, just drop the final 'm' to get the main photo (or replace it with a 't' to get an even tinier version).

Blogs

Unknown.

Range Signup Sheet

We're going to break up the user ids into ranges and let individuals claim a range to download. Use this table to mark your territory:

Start	End	Status	Size (Uncompressed)	Claimant
1	999	Uploaded	55MB	closure
1000	1999	Uploaded	283MB	alard
2000	2999	Uploaded	473MB	DoubleJ
3000	3999	Downloaded	234MB	Teaspoon
4000	4999	Uploaded	183MB	Paradoks
5000	5999	Uploaded	202MB	robbiet48/Robbie Trencheny (Amsterdam)
6000	9999	Uploaded	1.1gb	Sketchcow/Jason Scott
10000	29999	Uploaded	5.1gb	Sketchcow/Jason Scott
30000	31999	Uploaded	485mb	Sketchcow/Jason Scott
32000	32999	Uploaded	201MB	Paradoks
33000	33999	Uploaded	241mb	closure
34000	100000	Uploaded	unknown (20+ gb?)	closure
100000	101000	Downloaded	205.6 MB	xlene
101001	102000	Uploaded	232MB	robbiet48/Robbie Trencheny (Florida)
102001	103000	Uploaded	241MB	robbiet48/Robbie Trencheny (Amsterdam)
103001	104000	Uploaded		yipdw
104001	105000	Downloaded		Coderjoe
105001	114999	Uploaded	2.1GB	Paradoks
115000	116999	Uploaded		yipdw
117000	119999	Downloaded		Coderjoe
120000	130000	Uploaded	2.3GB	robbiet48/Robbie Trencheny (Florida)
130000	140000	Claimed		robbiet48/Robbie Trencheny (Florida)
140001	160000	Uploaded		yipdw
160001	180000	Downloaded	2.4GB	jch
180001	200000	Uploaded		yipdw
200001	220000	Downloaded		Coderjoe
220001	230000	Claimed		xlene
230001	240000	Uploaded	4.4GB	alard
240001	250000	Downloaded		Teaspoon
250001	260000	Claimed		robbiet48/Robbie Trencheny (Newark)
260001	270000	Uploaded	4.0GB	robbiet48/Robbie Trencheny (Fremont 1)
270001	280000	Uploaded	3.2GB	robbiet48/Robbie Trencheny (Fremont 2)
280001	290000	Uploaded	3.8GB	DoubleJ
290001	300000	Uploaded	3.9GB	dnova
310001	320000	Downloaded		Coderjoe
320001	330000	Claimed		robbiet48/Robbie Trencheny (Oakland)
330000	340000	Uploaded		closure
340000	400000	Uploaded	25gb	Sketchcow/Jason Scott
400001	500000	Done		DoubleJ
500000	600000	Downloaded	37 GB	closure (penguin)
600001	700000	Claimed		no2pencil
700001	800000	Uploaded	36GB	proub/Paul Roub
800001	900000	Uploaded	39GB	proub/Paul Roub
900001	1000000	Claimed		Soult
1000001	1100000	Claimed		Avram
1100001	1200000	Uploaded	33GB	Paradoks
1200001	1300000	Downloaded	36 GB	db48x
1300000	1400000	Downloaded	36 GB	closure (penguin)
1400001	1500000	Uploaded		alard
1500001	1600000	Claimed	28.8% done	ksh/omglolbah
1600001	1700000	Claimed	32.4% done	ksh/omglolbah
1700001	1800000	Claimed	31.5% done	ksh/omglolbah
1800001	1900000	Claimed	30.1% done	ksh/omglolbah
1900001	2000000	Claimed	30.8% done	ksh/omglolbah
2000001	2100000	Claimed	25.4% done	ksh/omglolbah
2100001	2200000	Claimed		Teaspoon
2200001	2300000	Claimed	60% done	Darkstar
2300001	2400000	Uploading	70gb compressed	Darkstar
2400001	2500000	Claimed		underscor (snookie)
2500001	2600000	Claimed		Bardicer
2600001	2700000	Claimed		Robbie Trencheny (Amsterdam)
2700001	2800000	Claimed		Robbie Trencheny (Fremont 2)
2800001	2900000	Claimed		Coderjoe (system1)
2900001	3000000	Claimed		Coderjoe (system2)
3000001	3100000	Claimed	78GB uncompressed	Qwerty0
3100001	3600000	Claimed		Jason Scott/Sketchcow
3600001	3700000	Claimed		DoubleJ (bff v10)
3700001	3800000	Uploaded		yipdw
3800001	3900000	Uploaded		oli
3900001	4000000	Claimed		Jason Scott/Sketchcow
4000001	4100000	Claimed		primus102
4100001	4200000	Claimed		Zebranky
4200001	4300000	Claimed		Zebranky
4300001	4399999	Downloaded	255GB (196GB compressed)	db48x
4400000	4599999	Claimed		Jade Falcon
4600000	4799999	Claimed		Soult
4800000	4809999	Uploaded		alard
4810000	4899999	Claimed		oli
4900000	4999999	Claimed		db48x
5000000	5099999	Claimed		jch
5100000	5199999	Claimed		hydruh
5200000	5299999	Claimed		chris_k
5300000	5349000	Claimed		ersi
5349001	6349999	Claimed		Underscor 03:37, 12 May 2011 (UTC)
6350000	6449999	Claimed		Paradoks
6450000	6550000	Claimed		yipdw
6550001	6700000	Claimed		oli
6700000	6800000	Claimed		closure (penguin)
6800001	6900000	Claimed		alard
6900001	7000000	Claimed		oli
7000001	7100000	Claimed		seanp2k (likwid/@ip2k on twitter)
7100001	7150000	Claimed		zchr
7150001	124138261	Pool - unclaimed		— Check for an update to the script!

Please try and claim 100,000 id blocks at this time, or more if your system has adequate space.

Known issues

Affected	Issue	Resolution
User IDs < 340000	Suspect blog content	Jason will run a blog-check at the end
Profiles retrieved with bff.sh < v8	Missing blog content	Redownload affected profiles or wait for blog-check
Profiles retrieved with bff.sh < v9	Missing images	Redownload affected profiles
Profiles with more than one shoutout page, retrieved with bff.sh < v12	Only first page of shoutoutstream	Redownload profiles that have a file `shoutout_2.html`

Blogs with bad links

Some blogs have bad links that expand into an infinite tree. The latest version of bff.sh ameliorates this problem by limiting recursion depth to 20, but in some cases that can still be too much.

These profile IDs are known to have blogs that cause problems:

ID	Example of offending URL
319533	`exquisitelle.blog.friendster.com/category/uncategorized/<object width=/<object width=/...`
3007822	`iz-freedom.blog.friendster.com/category/uncategorized/;/;/...`
3035078	`khelay-angela.blog.friendster.com/category/music/<div style=/<div style=/<div style=/...`
3764079	`luzie53.blog.friendster.com/&tbnh=106&tbnw=142&hl=tl&start=4&prev=/images%3Fq%3Dsad%2Bangel%2Bpictures%26svnum%3D10%26hl%3Dtl%26lr%3D%26sa%3DN/...`
3774275	`msiabeckham.blog.friendster.com/category/uncategorized/\/\/\/\/\/\/\/\/\/page/2/\/page/2...`
3789069	`chrisna.blog.friendster.com/&tbnh=110&tbnw=80&hl=en&start=20&prev=/images?q=friendship&hl=en&lr=&sa=G/images?q=friendship&hl=en&lr=&sa=G/...`
488742	`oxidation-hani.blog.friendster.com/category/uncategorized/page/category/uncategorized/page/...`

If you come across one of these, please add to this list.

Blogs with corrupt images

Some blogs have links to images that just never finish downloading. wget downloads to 99%, then hangs until the server closes the connection.

ID	Example of offending URL
1421002	`http://diverjun23.blog.friendster.com/files/sany0901.jpg`
2848374	`http://aspen.blogs.friendster.com/photos/uncategorized/test.gif`

If you come across another corrupt image, please add it to the list.

Friendster

Tools

friendster-scrape-profile

Automating the process

Advanced: multiple instances

Requirements

Manually

snook.sh

invoker.pl and summary.pl

Troubleshooting

Site Organization

Profiles

Photo Albums

Blogs

Range Signup Sheet

Known issues

More on the missing image problem

More on the shoutout page problem

Blogs with bad links

Blogs with corrupt images

Navigation menu

Friendster

Tools

friendster-scrape-profile

Automating the process

Advanced: multiple instances

Requirements

Manually

snook.sh

invoker.pl and summary.pl

Troubleshooting

Site Organization

Profiles

Photo Albums

Blogs

Range Signup Sheet

Known issues

More on the missing image problem

More on the shoutout page problem

Blogs with bad links

Blogs with corrupt images

Navigation menu

Search