Friendster dataset

From Archiveteam
Revision as of 16:06, 17 January 2017 by Jscott (talk | contribs) (Reverted edits by Megalanya1 (talk) to last revision by Jscott)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

It would be nice to collect a dataset of the Friendster social graph, linking friends to friends and users to groups. This could be of interest to social computing scientists and other graph lovers.

There already exists a small dataset, but that only contains 100,000 users. We can do better.

Download finished.

Dataset readme

You can find these datasets in the Internet Archive: friends and groups.


Friendster social network data set -- Friends lists (June/July 2011)
====================================================================

Before its relaunch as a gaming website, Friendster was a social networking
website that allowed users to connect with their friends. The central element
of the site was the 'friends list', showing the contacts of the user.

This dataset contains the connections between all Friendster users. It is the
result of an extensive crawl of Friendster.com at the end of June 2011. It
was performed as part of the ArchiveTeam project to archive part of the
Friendster data before the service relaunched.

The data files list, for each user id, the user ids of the users that were
listed on the friends page of that user. These connections have a direction:
if user A lists user B as a friend, that does not imply that the friends list
of user B also includes user A.

Some of the users had elected to keep their friends list private. It is
possible, however, that these users appear in the friends lists of others.


Statistics
----------

The dataset contains the friends lists of 103,750,348 users. The friends
lists of an additional 14,001,031 users had been marked private. In total,
the dataset contains 2,586,147,869 friend connections.

In graph terms: the graph contains 117,751,379 nodes and 2,586,147,869
directed edges.


Files in this dataset
---------------------

The dataset consists of 125 compressed text files, each containing the data
of 1,000,000 users. The filename indicates the prefix of the user ids in the
file. For example, file  friends-031______.txt  provides the friends lists
of the users with user ids 31,000,000 to 31,999,999.


File format
-----------

Each line of the data files represents the friends list of one user, in the
following format:

  <user id of user> : <comma separated list of user's friends>

e.g.,  1:2,3        indicates that user 1 lists users 2 and 3 as friends.

If the friends list for a user was private, this is indicated as follows:

  <user id of user> : private

e.g.,  4:private

Similarly, user ids that did not map to a valid user are marked with notfound:

  <user id of user> : notfound

e.g.,  5:notfound

The user ids used here correspond with the user ids in the group membership
data file.
Friendster social network data set -- Groups lists (June/July 2011)
===================================================================

Before its relaunch as a gaming website, Friendster was a social networking
website that allowed users to connect with their friends. One of the elements
of the site were the groups that members could join.

This dataset contains the group memberships of all Friendster groups. It is
the result of an extensive crawl of Friendster.com at the end of June 2011.
It was performed as part of the ArchiveTeam project to archive part of the
Friendster data before the service relaunched.

The data files list, for each group id, the user ids of the users that were
listed on the member list page of that group.


Statistics
----------

The dataset contains the member lists of 1,449,666 groups. In total, the
dataset contains 38,728,037 group memberships from 9,671,720 unique users.

In graph terms: the graph contains 11,121,386 nodes and 38,728,037 edges.


Files in this dataset
---------------------

The dataset consists of one compressed text file.


File format
-----------

Each line of the data file represents the member list of one group, in the
following format:

  <group id> : <comma separated list of user ids>

e.g.,  1:2,3    indicates that user group 1 has users 2 and 3 as members.

The user ids in this file correspond with the user ids in the friends list
data files.