Difference between revisions of "My Opera"
| Mithrandir (talk | contribs) m | |||
| Line 51: | Line 51: | ||
| ** Note that the page count ("page=NUMBER") is different than normal because each saved page shows 150 posts at a time ("perscreen=150"). | ** Note that the page count ("page=NUMBER") is different than normal because each saved page shows 150 posts at a time ("perscreen=150"). | ||
| * [https://archive.org/details/archiveteam_archivebot_go_003 Archivebot] crawl of the forums, 4.7 GB compressed. | * [https://archive.org/details/archiveteam_archivebot_go_003 Archivebot] crawl of the forums, 4.7 GB compressed. | ||
| == Archiving and contributing  == | == Archiving and contributing  == | ||
Revision as of 12:04, 11 August 2014
| My Opera | |
|   | |
| URL | my.opera.com[IA•Wcite•.today•MemWeb], files.myopera.com[IA•Wcite•.today•MemWeb] | 
| Status | Offline | 
| Archiving status | Saved! | 
| Archiving type | Unknown | 
| Project source | https://github.com/ArchiveTeam/myopera-grab | 
| Project tracker | http://tracker.archiveteam.org/myopera | 
| IRC channel | #fatlady (on hackint) | 
My Opera is a social media website for the Opera browser. Originally started as just a support forum, it later expanded to include blogging, image/file hosting, and email.
On October 31, 2013, Opera announced they would shut My Opera down on March 1, 2014. On February 19, 2014, this was changed to March 3, 2014.
Shutdown notice
... The explosion of these sites and the amount of resources we need to maintain our own service has changed our outlook on My Opera. We had a good run for many years, but we believe your content could have a better home elsewhere, so we have made the decision to shut down My Opera as of March 1, 2014. [1]
The shutdown affects blogs/comments, files, email, and their forums ('The most important existing threads will be moved...').
On February 24, 2014, this message was placed on each forum/topic:
The My Opera forums are being replaced by our new forums. Starting February 26th, the My Opera forums will be in read-only mode. On March 3rd, they will be removed along with the rest of My Opera.
Archives
There are two ways to browse the archives:
- Check if your content has been ingested into the Wayback Machine.
- Pages from an individual profile for some USERNAME can be seen at https://web.archive.org/web/*/http://my.opera.com/USERNAME/*
- Similarly, individual pictures/files can be seen at https://web.archive.org/web/*/http://files.myopera.com/USERNAME/*
- For example: [1] and [2].
 
- If you don't find anything, you can try extracting the files from the WARC files with some WARC tools.
- This method requires power user skills. In essence, scan each CDX index file and then extract it from the appropriate WARC files. Ask us in IRC for help.
 
Links to WARC files:
- Initial grab of files.myopera.com, 6.2 GB compressed.
- Wallpaper grab, 1.7 GB compressed.
- Profile archives, ~6 TB compressed. 16,445,577 profiles were saved but 3988 profiles were not.
- Forum archives, 5.77 GB compressed.
- IDs 1-1769625 and 1800001-1823192 were saved, 1769626-1800000 were not (although they may already be in the Wayback Machine.)
- URL format is http://my.opera.com/community/forums/topic.dml?id=ID&abc=&page=NUMBER&skip=OFFSET&show=&perscreen=150
- OFFSET = (Page number - 1) * 150
- (This format makes the "next/prev" buttons on each page work.)
 
- For example: [3]
- Note that the page count ("page=NUMBER") is different than normal because each saved page shows 150 posts at a time ("perscreen=150").
 
- Archivebot crawl of the forums, 4.7 GB compressed.
Archiving and contributing
Phase 0: Initial crawl
- Grab a seed list of users from the location pages. Done! 1,621,618 usernames (location pages + files.myopera.com initial grab + attempted forum grab + wallpaper grab)
- Grab a list of links to all forum topics and all pages. (In progress by User:Mithrandir)Had some issues with this one (my fault), probably better to:
- Crawl all forums topics and pages (Not in progress)
- Grab user-uploaded Opera-themed wallpapers Done!
Phase 1: Username crawl
https://github.com/MithrandirAgain/myopera-username-grab
Fortunately, Opera was kind enough to provide us with a complete list of all non-banned users (thanks to Atluxity for setting this up!)
(Note: The original list had some encoding issues as well as some invalid usernames, but this was relatively easy to fix with the FTFY library. You can download the original, non-fixed list here and here.)
Phase 2: Content crawl
Please run your Warrior and select the My Opera project. Wait a while for it to install extra things and it should start downloading.
Alternatively, the scripts can be run manually and instructions are located here: https://github.com/ArchiveTeam/myopera-grab
Site structure notes
- As of 2009, there were around 16 million users.
- LOTS of old data, abandoned accounts, etc.
- Forum topics go all the way back to Post #1 dated 7 September, 2001.
- It looks like there are nearly 2 million topics.
 
- There's a sub-domain that houses user uploaded data.
- Initial Bing-crawled list here (5572 urls)
- Tons of webpages, pdfs, images, archives, etc.
- Some of this data is linked to in blog posts, so we should crawl blogs for this as well.
 
- Pretty much any UTF-8 (?) character can be used in a username (e.g. §|-|€€PJU|)g³ or ္္္္္္္floppyeye)
- Each user can have:
- An about page (http://my.opera.com/USERNAME/about)
- A blog (http://my.opera.com/USERNAME/blog)
- Blog archive page (http://my.opera.com/USERNAME/archive)
 
- A photo album (http://my.opera.com/USERNAME/albums)
- Friends (http://my.opera.com/USERNAME/friends)
- Favorite users/blog posts/photos (http://my.opera.com/USERNAME/favorites)
- Links page (http://my.opera.com/USERNAME/links)
- Site featured wallpaper (http://my.opera.com/community/opera/wallpapers/)
- 2GB of space for files, except executables.
- Posts on the Opera forums (easier just to crawl all the topics instead)
- Groups (listed on the about page)
- Group members are located at http://my.opera.com/GROUPNAME/members
- Groups are basically the same as users.
 
- Recent visitors box on the user page.
 
Username discovery
- This is a little tricky, as the only pages with a large list of usernames are the location pages.
- Each country list is paginated, showing at most 72 users per page.
- For large countries, going past page ~325 returns many 503 errors.
- Assuming an average of 100 retrievable pages per country and 246 countries, that's 72*100*246 = 1,771,200 users.
- Excludes users who haven't set their location, which seems to be a lot.
- This should be good for a seed list.
 
- Forum posts, blog comments, friend lists, and groups seem to be the best ways to get the most usernames. (Maybe throw in the recent visitors list too.)
References
External Links