The home page of LiveJournal, as seen on May 4, 2013.
|Discovery: livejournal-discovery; grab: livejournal-grab
LiveJournal is a blog community started by Brad Fitzpatrick back in 1999. It's changed hands a few times since then and the (huge) userbase has been pretty upset about how the new owners in Russia, SUP, are running the show. All the previous owners have had a potted history of banning people for fairly innocuous things.
In March 2016, ArchiveTeam started saving LiveJournal, "because it is very old, widely regarded as in decline, and has a lot of important stuff buried in it".
- Many core pages (blog posts, etc.) are returning 505 errors.
- Notifications, Purged Accounts, Stats, TxtLJ, Gulf_Aid_Now (Jul. 14, 2010 news:update "One of the benefits of the work we've done to purge suspended accounts is that we will now be able to purge inactive journals and communities too--something you've been requesting for years!
A journal is defined as inactive if it has not been logged into for 24 consecutive months. A community is defined as inactive if has not been updated for 24 consecutive months.A journal is defined as inactive if it has not been logged into for 24 consecutive months and has only one post (i.e., the welcome post). A community is defined as inactive if has not been updated for 24 consecutive months and has only one entry and no comments. Once an account is eligible to be purged for inactivity, the owner will be sent an email to alert them of the inactive status. The owner will then have two weeks to log into the journal or post to their community to prevent it from being deleted. If the owner does not log in or post, the account will be delete..."
- Changes at LJ HQ (Jan. 8, 2009 news:update) "The restructuring is done with an eye to the future to ensure the long-term viability of LiveJournal as a business. As a team, we know that LJ has a great future as it prepares for its second decade." - "We recently invested a considerable amount on all-new server equipment and a facility in Montana to house it all as part of our commitment to the longevity of LJ." - "We will be around for years to come and we're committed to ensuring that your journals, friends pages, and communities will be, too."
Site structure to extract active profiles
The site has a variety of addresses and each journal is hosted on the subdomain of the main site. e.g. the7days.livejournal.com but these can be found by checking their feed via this address ext-NUMBER.livejournal.com/feed/ This method yeards 3.5 million hits. The site also has a profile page for each user too www.livejournal.com/profile?userid=1&t=I This method yields 77 million profiles.
Scrape of profile is ongoing.
Once all the ext-NUMBER have been found, they will be converted to the subdomains if the user has a journal on the site.
There is a 2 part scrape happening 1. ext-NUMBER 2. profile page
- LiveJournal's own export journal page can do a month at a time.
- Antennapedia (Mac OS X out-of-the-box support, needs Python where missing) - For migrating journal entries from any LJ-style server to any other LJ-style server.
- ljArchive (Windows only) - A nice interface grabs the info from the servers and presents it in its own customizable templates within the program. Exports to HTML and XML. It's very easy to use and is currently being developed on Sourceforge.
- Livejournal Export Script - Pull Livejournal into a database (GDBM), allowing export into HTML or XML, and further import into Wordpress or other blog software.
- LJbook (Currently overloaded) - Web interface exports LJ to a PDF suitable for printing on Lulu or just backing up, with images and other options. Limited use per month for unpaid users.
- ljdump (Python) slurps everything down into a pile of XML files.
- Wordpress.com can import entire LiveJournals, including comments. Not sure if it's also available in the standalone Wordpress software, or only the hosted service.
- XJournal (Mac OS X only) can download all entries.
- LJMirgate (Python) can archive the entire journal, and optionally migrate to another LJ-based site like InsaneJournal or Dreamwidth.
- ljdump (Python) dumps to HTML, and can output the format expected by the Wordpress LJ import plugin.
How can I help?
Running a Warrior
You can start up a Warrior and there select LiveJournal Discovery. (If you don't really care what you are archiving, select ArchiveTeam's Choice instead, as at some points ArchiveTeam may prioritize another project.)
Running the script manually
If you use Linux and you're a bit familiar with it, you can try running the script directly.
The instructions can be found at github.com/ArchiveTeam/livejournal-discovery.
For this project, set concurrency to 1, as LiveJournal tends to ban scrapers!
|Some additional information
|Don't forget to replace YOURNICKHERE with your nickname.
The number after
If you want to stop the script, please do it gracefully if possible. To do so, create an empty file named STOP in the folder of the script (terminal command:
If you see "Project code is out of date", kill the script, go to its folder (
Donating to the Internet Archive
Content downloaded by the ArchiveTeam will be uploaded to the Internet Archive, where it will be stored and be available – hopefully – forever. However, storing it costs thousands of dollars in the long run. So, if you can afford, please consider donating to the Internet Archive, so that this piece of history can be kept for us all. http://archive.org/donate
Do you like our cause?
If you want to help in other projects, want to learn more about ArchiveTeam, or even help in development in general, navigate to the Main Page of this wiki, from there you can reach a lot of information. The Team consists of volunteers working on the projects in their free time, so helping hands (and resources) are always welcome.
- https://ljsear.ch/ - Another archiving effort, publishing 2000-2015 LiveJournal posts from a particular search engine cache