Difference between revisions of "Yahoo! Groups"

From Archiveteam
Jump to navigation Jump to search
(→‎Doranwen's metadata upload: This "can be parsed both ways")
 
(83 intermediate revisions by 11 users not shown)
Line 1: Line 1:
{{Infobox project
{{Infobox project
| title = Yahoo! Groups
| url = https://groups.yahoo.com/
| url = http://groups.yahoo.com/
| image = groups-yahoo-com.png
| image = groups-yahoo-com.png
| logo = yahoo-groups-logo.png
| logo = yahoo-groups-logo.png
| project_status = {{online}}
| project_status = {{offline}}
| archiving_status = {{inprogress}}
| archiving_status = {{partiallysaved}}
| tracker = [https://tracker.archiveteam.org/yahoogroups/ yahoogroups], [http://tracker-test.ddns.net/yahoo-groups-api/ yahoo-groups-api]
| source = [https://github.com/ArchiveTeam/yahoogroups-grab yahoogroups-grab], [https://github.com/ArchiveTeam/yahoo-group-archiver yahoo-group-archiver]
| irc = yahoosucks
| irc = yahoosucks
| data = {{IA id|archiveteam_yahoogroups}}
}}
}}


'''Yahoo! Groups''' is Yahoo's email service; it's the result of the acquisition of eGroups and some other Yahoo! stuff.
'''Yahoo! Groups''' was [[Yahoo!]]'s combination mailing list service/web forum, the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offered file uploads, photo uploads, links, polls, and an events calendar. It had been stable since the late 90s, long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.) It was shuttered in stages over the course of 2019–2020.


It's been stable for a long time (since the late 90s), long enough for some specialised software to be developed to do backups of it. (Not many other websites can say ''that''.)
Uploading of new content was disabled 2019-10-28, and all content, including message history, was made unavailable on 2019-12-14.<ref>https://web.archive.org/web/20201126125219/https://help.yahoo.com/kb/groups/SLN31010.html</ref> Group content was hidden from the web interface by 2019-12-21. After negative media attention, Yahoo announced that they were extending the deadline for users to use their official "GetMyData" export tool (which missed a plethora of attachments, databases, polls, photos, and metadata) to 2020-01-31.<ref>{{URL|https://www.theverge.com/2019/12/10/21004883/yahoo-groups-extend-deadline-download-data-date-time}}</ref><ref>{{URL|https://twitter.com/YahooCare/status/1204312076379926528}}</ref> They stopped accepting GMD requests on 2020-02-04.


== Python Yahoo! Group Archiver ==
Groups continued to function solely as mailing lists for a short period. However, the creation of new groups was disabled on 2020-10-12, and the web interface and mailing lists were shut down on 2020-12-16.<ref name="websiteshutdown">{{URL|https://help.yahoo.com/kb/groups/SLN35505.html}}</ref>


The [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] is a Python script which allows a scraping of the group. So far only messages are scraped. It puts all the info and metadata (both rendered message body and raw email) into a Mongo database, and provides a script to dump a static version of the site that can be read off of the filesystem. It works with Neo and with private groups by clunkily using Selenium to do the scraping.
Group admins and members, please see [[Yahoo! Groups/Archiving Project FAQ for group members]] or join IRC if you have questions.


Another Python-based Archiver is [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver], which is a simple Python script to dump the messages into individual JSON files. No further processing of the messages is done to preserve them in the format Yahoo uses for displaying them. Private groups can be archived by providing the contents of two cookies that Yahoo uses to verify a logged-in user.
== The state of preservation ==


Yet another Python-based Archiver is https://github.com/philpem/yahoo-group-archiver.
[[File:Yahoo Groups provenance.png|1000px]]


== Perl Yahoo! Group Archiver ==
=== Summary ===
The remnants of Yahoo Groups information is split among several pieces, some of which are uploaded to IA, some of which will hopefully be publicly uploaded in the future, and some of which are not suitable to be made public and will presumably be kept darked or otherwise restricted on IA. The difficulty of sorting out group data between the last two categories is the chief obstacle to uploading what we have. There are also some privacy-ok files that never got uploaded, some of which we have, but some of which we have been unable to locate.


Update: Apparently since Yahoo! Groups changed to the neo interface the script no longer functions and is no longer actively maintained.
The ArchiveTeam capture of Yahoo groups took place in 4 parts:
* [[#2015-2018 API grab|A proactive API grab done before it indicated it was down]]
* [[#2019 normal webpage grab|A normal DPoS capture]]
* [[#2019 API grab|A DPoS, though not wget-lua, grab from the API, after the web interface was shut down while the previous was incomplete]], the source of the files that are now missing
* [[#ArchiveTeam GetMyData|A somewhat haphazard gathering of data from an export feature]], the locus of the privacy issues
Additionally, we worked closely with a "fandom project" that made [[#Fandom GetMyData|its own GetMyData capture]] - which has already been shared with us insofar as is going to be shared - and received [[#PGOffline etc.|a few miscellaneous archived mailed in from various people]].


<s>The [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver] is a Perl script which allows an export of "the messages (without the attachments), everything from the files section and all the images from the photo section along with their hierarchy on Yahoo".  
=== Group Publicity ===
From our standpoint a group could be in one of the following categories with respect to privacy:
* Public groups.
* Groups we requested to join, whose join request was still pending and had not yet expired. (Obviously there are none of these left, but for reasons explained below this stage is of great importance.)
* Groups we requested to join and got accepted to, either manually or automatically. ("Actively accepted")
* Groups we requested to join and got rejected from ("Actively denied" or "denied")
* Groups we requested to join, where the join request languished and expired after 2 weeks. ("Expired", "passively denied", or "timed out")
In any of the latter 3 cases the account requesting to join got an email. The privacy difficulty we have had with [[#The GetMyData process|the GetMyData-derived archives]] arises from two points:
* Many accounts used for that stage of the project were accounts at mail.com, which automatically deletes mailboxes after a period of inactivity, meaning the accept/deny/expire emails got deleted before we could gather them.
* GetMyData incorrectly sent out GetMyData archives of groups to accounts that were in the "pending" stage of joining them.


It appears that, if you get the "Couldn't get message count" error when trying to use it, the solution is to edit the yahoo2maildir.pl file and replace the bottom line <code>my $url = $HTTP::URI_CLASS->new($redirect, $base)->abs($base);</code> (under the heading <code>sub GetJSRedirect</code>) with <code><nowiki>my $url = "http://groups.yahoo.com/group/$group/messages/$begin_msgid"; </nowiki></code>
=== The GetMyData process ===
After Yahoo closed [[#2019 normal webpage grab|the normal Groups interface]] as well as [[#2019 API grab|the API]], there was only one avenue to continue to get information from it: "GetMyData", a process intended for people who were already in groups to get relevant records. ArchiveTeam, and parallel to it the fandom project, exploited this in order to try to get better coverage. "Get My Data files are a set of .zip files up to 2GB each. Each one has a variable number of groups in them (however many would fit in the 2GB). Each group has a messages .zip file with a number of .mbox files, a files .zip file with a backup of the group's file section, and a links .zip file with the group's links section. Yahoo unfortunately didn't bother sending other data in most cases, like attachments and photos, unless it was something the user requesting the backup personally posted".<ref>#yahoosucks, September 2022, two messages merged together here</ref>


More frustratingly, it appears that Yahoo blocks your IP temporarily after hitting some invisible limit of data downloaded (the Archiver will continue to "download" messages for a bit, ending up with a bunch of 0-byte files, then stop completely). It's unknown if there is a solution.
"The way Yahoo implemented their Get My Data utility, you would get backups for any group you were a member of and [[#Group Publicity|any group for which you had a pending request to join]]. So, we would occasionally get back data for groups that later actively denied the join request, and we would quite commonly get back data for groups that never responded, given that most groups were abandoned by 2019. This behavior was extremely consistent and applies to all private groups".<ref>#yahoosucks, September 2022, two messages merged together here</ref>


Also: sometimes, some of the downloaded messages, in the middle of an otherwise normal batch, are 0 in size - almost as if Yahoo blocked your IP for a few seconds, then stopped. Watch out for these so that you can re-download them later.</s>
=== Yahoo Groups ===
Forum/mailing list host that shut down in a process spanning late 2019 to early 2020.


== Site Structure ==
=== 2015-2018 API grab ===
Scrape of the Yahoo Groups API. Led, or perhaps done entirely, by PurpleSym. There is one WARC file per group, and several group-WARCs per IA item. The items are located in [https://archive.org/details/archiveteam_yahoogroups the IA collection archiveteam_yahoogroups], and are distinguishable from everything else in that collection by their upload date not later than 2018, and by their thumbnails being a photo of a "Yahoo!" sign (with the exception of [https://archive.org/details/yahoogroup_info_grab something else from about 2 years later]). Although they are in WARC format, they seem to use the resource record-type with synthetic URIs, meaning they will not work in the Wayback Machine.


There’s a convenient JSON API. May require logging in and joining a group to use all endpoints:
=== Doranwen's metadata upload ===
An IA item located [https://archive.org/details/Yahoo_Groups_Metadata here] created by the fandom project's Doranwen. Contains tables of group metadata, parsed from the [[#2015-2018 API grab|2015-2018 API grab]] and from [[#Combined GetMyData posessed by Doranwen|their GetMyData collection]].


* Group Information: https://groups.yahoo.com/api/v1/groups/concatenative/
=== Fandom GetMyData, not shared ===
* List of Messages: https://groups.yahoo.com/api/v1/groups/concatenative/messages?count=100
Groups from the [[#Fandom GetMyData|fandom project's GetMyData effort]] which were either [[#Group Publicity|"actively denied"]] or originated outside the normal GMD process, such as by a group member sending it to them. This also includes GMDs made by the fandom project on behalf of members of private groups, which it retains only as a backup in case the member loses their copy.<ref name="fandompubpriv">#yahoosucks, October 2022, "OrIdow6: The only part of that that..."</ref> [[#Combined GetMyData possessed by Doranwen|These have been kept with the fandom project]].
* Specific Message: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/
* Raw Message Content: https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw – note that there seems to be a [https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean message encoding problem]
* List of Topics: https://groups.yahoo.com/api/v1/groups/concatenative/topics?count=100
* Specific Topic: https://groups.yahoo.com/api/v1/groups/concatenative/topics/1
* List of Tables: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database
* Specific Table: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/
* Table Content: https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records
* List of Files: https://groups.yahoo.com/api/v1/groups/a_furrys_world/files
* List of Attachments: https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments
* List of Polls: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls?count=100
* Specific Poll: https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106
* List of Photos: https://groups.yahoo.com/api/v1/groups/a_furrys_world/photos
* List of Albums: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums
* Specific Album: https://groups.yahoo.com/api/v1/groups/a_furrys_world/albums/1841906391
* List Moderators: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/moderators
* Members With Incorrect Emails: https://groups.yahoo.com/api/v1/groups/a_furrys_world/members/bouncing
* List of Links: https://groups.yahoo.com/api/v1/groups/a_furrys_world/links
* Search: https://groups.yahoo.com/api/v1/search/groups?offset=0&maxHits=20&sortBy=&query=abcdef – sort can be one of OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST
* Categories: https://groups.yahoo.com/api/v1/dir/categories/0/?start=0


Note that all paginated responses are limited to the first 500 results and do not return anything new beyond that.
=== Fandom GetMyData, shared ===
Groups from the [[#Fandom GetMyData|fandom project's GetMyData effort]] which it decided were public enough to send to us, namely those which [[#Group Publicity|were public, accepted/approved, and "accepted" through the Groups permissions bug]], as well as GMDs sent in by outside people to the fandom project<ref name="fandompubpriv" />. [[#Combined GetMyData possessed by lennier1|A copy of these were sent to lennier1]].
 
=== Fandom GetMyData ===
The fandom project conducted its own effort to collect GetMyData archives. Some of these the fandom project, acting by its own standards, has kept entirely [[#Fandom GetMyData, not shared|private]]; some [[#Fandom GetMyData, shared|it has given to us]]. There was some overlap in Yahoo accounts (AKA IDs) with ArchiveTeam, possibly leading to mixing of data with [[#ArchiveTeam GetMyData|our GetMyData process]]. <ref>#yahoosucks, September 2022, search "i am sure that at least some ids whose info"</ref>
 
=== 2019 normal webpage grab ===
A regular, [[DPoS]] attempt at a grab of the Yahoo Groups website. As with many DPoS projects, navigation is somewhat broken in the WBM, but it does play back.<ref>"(even the 'html' yahoogroups-grab", parenthesis and single quotes in original, #yahoosucks, September 2022</ref>. Nonetheless the closest thing to sanity of any of the archiving attempts.
 
The repository is at [https://github.com/ArchiveTeam/yahoogroups-grab ArchiveTeam/yahoogroups-grab]. The data is on IA, in archiveteam_yahoogroups, prefixed simply by "Archive Team Yahoo! Groups", e.g. [https://archive.org/details/archiveteam_yahoogroups_20191214012211_24a04e36 archiveteam_yahoogroups_20191214012211_24a04e36].
 
=== Combined GetMyData possessed by lennier1 ===
A collection of GetMyData output files currently held by lennier1 and not uploaded anywhere. Contains the results from [[#ArchiveTeam GetMyData|the ArchiveTeam GetMyData effort]], as well as [[#Fandom GetMyData, shared|what the fandom project was willing to share with us]]. This with the addition of the data the fandom project didn't share with us would become the contents of [[#Combined GetMyData posessed by Doranwen|Doranwen's GetMyData holdings]].
 
Not all this data can be made public. A bug in Yahoo groups allowed execution of GetMyData on any restricted group merely by applying to join it, before being accepted or rejected. Additionally some of these files "were contributed by people who were ok making some groups (or data types) public but not others"<ref>#yahoosucks, September 2022</ref>. As such the plan is to separate this out into [[#Public ArchiveTeam GMD upload|a public segment]] and [[#Darked ArchiveTeam GMD upload|a darked segment]] before uploading both to IA.
 
=== Public ArchiveTeam GMD upload ===
A planned IA upload of GetMyDatas from [[#Combined GetMyData possessed by lennier1|the ones ArchiveTeam possesses]], after it is sorted out which ones are public and private. As more than one group could fit into a GetMyData zip presumably the raw files we recieved cannot be uploaded; rather it will be necessary to extract the individual groups.
 
=== Darked ArchiveTeam GMD upload ===
A planned IA upload of GetMyDatas from [[#Combined GetMyData possessed by lennier1|the ones ArchiveTeam possesses]], after it is sorted out which ones are public and private. Presumably just the zip files, but "darked", i.e. inaccessible to everything but privileged IA accounts (employees, and knowing them probably some other people as well).
 
=== PGOffline etc. ===
"[M]iscellaneous stuff like [http://www.personalgroupware.com/ PGOffline] data and people running the archiving program manually" sent into us, currently not uploaded, and held by lennier1. Presumably [[#Upload of PGOffline to IA|to be uploaded]]. As PGOffline did not suffer from the permissions bug presumably these are sufficiently privacy-safe.<ref>#yahoosucks, "The API archiving program...", September 2022</ref>
 
=== Upload of PGOffline to IA ===
Planned upload of [[#PGOffline etc|data from PGOffline and other miscellaneous sources]] to IA.
 
=== Combined GetMyData posessed by Doranwen ===
Combination of the fandom project's [[#Fandom GetMyData, shared|public-access-ok]] and [[#Fandom GetMyData, not shared|public-access-not-necessarily-ok]] GetMyData sets, as well as [[#ArchiveTeam GetMyData|the ArchiveTeam one]]. This is the most comprehensive GetMyData collection there is; lennier1 has a version without the no-public access material. It appears this is eventually to be split up into [[#Doranwen's organized upload|a fairly processed public upload]] as well as [[#Private groups kept by Doranwen|a non-public set]].
 
=== ArchiveTeam GetMyData ===
GetMyData archives collected by ArchiveTeam. Volunteers signed up for groups and then made GetMyData requests on the accounts; the results came by email, where they were sent to Marked. These are currently [[#Combined GetMyData possessed by lennier1|held by lennier1]].
 
=== Doranwen's organized upload ===
The planned upload, by the fandom project leader Doranwen, of the GetMyData archives they [[#Combined GetMyData posessed by Doranwen|possess]], and wish to make public by their criteria (presumably a subset of [[#Fandom GetMyData, shared|what they've given to us]]). Much of the discussion in #yahoosucks in 2021 and 2022 has concerned cleaning up and categorizing this data, hence this page's label of it as "organized".
 
=== Private groups kept by Doranwen ===
The subset of the GetMyData archives [[#Combined GetMyData posessed by Doranwen|possessed by Doranwen/the fandom project]] that they do not want to be made public. Indications are that these will be kept with people personally in perpetuity.
 
=== 2019 API grab ===
Technically unusual Seesaw/[[tracker]]/[[DPoS]] project written and led by Marked to get data from the Groups API after the normal interface, and with it [[#2019 normal webpage grab|the DPoS project that gathered from it]], had shut down. The GitHub repository is [https://github.com/ArchiveTeam/yahoo-group-archiver here]. Marked has said that, of the data produced by this, [[#2019 API grab, portion "in australia"|"1/3 is in australia]], [[#2019 API grab, portion with marked|1/3 with me]], and [[#2019 API grab, portion on IA|1/3 on IA"]]<ref>Quoted by thuban in #yahoosucks September 2022; timestamp indicates Marked originally sent this in March 2020</ref>. As of September 2022 neither of the first two parts have been uploaded.
 
=== 2019 API grab, portion with marked ===
Of the data from the [[#2019 API grab|2019 API grab]], "1/3" was with Marked. At some point between late 2020 and late 2022 [[#2019 API grab, portion with lennier1|this made its way to lennier1]].
 
=== 2019 API grab, portion with lennier1 ===
[[#2019 API grab, portion with marked|Marked's 1/3]] of the [[#2019 API grab|2019 API grab]] data, sent to lennier1. Not yet uploaded; will hopefully be sent to IA eventually.
 
=== 2019 API grab, portion on IA ===
Of the data from the [[#2019 API grab|2019 API grab]], "1/3" had been uploaded to IA in early 2020, and as of September 2022 that portion has remained unchanged. It is intended that [[#2019 API grab, portion with lennier1|the third originally with Marked and sent to lennier1]], and if it can be located [[#2019 API grab, portion "in australia"|the third "in australia"]], be merged into this. These can be found in [https://archive.org/details/archiveteam_yahoogroups?sort=-publicdate archiveteam_yahoogroups] as items prefixed by "archiveteam_yahoogroups_api", e.g. [https://archive.org/details/archiveteam_yahoogroups_api_20191217011957_8a14e083 archiveteam_yahoogroups_api_20191217011957_8a14e083].
 
=== 2019 API grab, portion "in australia" ===
Of the data from the [[#2019 API grab|2019 API grab]], "1/3" was, per a few enigmatic remarks from Marked, "in australia". "[T]here was a volunteered target in Australia, I forgot their username atm"<ref>#yahoosucks, June 2021</ref>. We have been unable to determine who this was, except that it seems unlikely to have been Kiska.
 
== Submitting group data to the public archive ==
 
Were you a member of a public group (one that did not require administrator approval to join)? Were you an admin of a private group whose members consent to be part of the public archive? Did you save the group yourself, using GetMyData or any other method?
 
If so, we'd love to have your archives. Upload to a fileshare such as WeTransfer, Dropbox, Google Drive, or Mega.nz and [mailto:archiveteamprivateyahoogroup@gmail.com email us] a link.
 
Feel free to remove data which should remain private (such as private groups in mixed public/private GetMyData results, or message history from private groups whose members wish to make only files and photos public) before sending us a copy.
 
However, try to make sure the data is otherwise unmodified! In particular, there may be old malware in GMD ZIP files. Modern email software and operating systems are expected to be resistant to this old malware, but some antivirus software may see it and attempt to modify or delete the ZIP file. Please be careful of this!
 
== Project history ==
 
Data collection for this project is over. Yahoo! Groups content is now inaccessible; although we continue to accept individual archives made by group members and admins, we can no longer archive additional groups.
 
While the project was active, volunteers could help in the following ways:
 
=== Nominating non-private groups for archival ===
 
Groups could be nominated for archival using [https://tinyurl.com/savegroups this form]. This was not used for groups that required administrator approval to join.
 
=== Submitting private groups for public archival ===
 
Administrators could request that their private group (we considered a private group to be one that required administrator approval of new members) be included in the public archive. We requested admins to ensure that the members of the group were happy about being part of the public archive.
 
To submit a group for archival, admins could [https://web.archive.org/web/20201020005238/https://help.yahoo.com/kb/SLN2567.html send a membership invite] to the email ''archiveteamprivateyahoogroup@gmail.com'' (''without'' selecting the "Add only to mailing list" option). We monitored that email regularly to accept any membership requests we received, and scheduled the group for archival once our Yahoo account was a member.
 
=== Joining groups and submitting data ===
 
We used [https://github.com/davidferguson/yahoogroups-joiner an extension for Chromium-based browsers] to partially automate the process of joining groups (at first since some groups only made message history visible to members, and later because after the closure of the web interface GetMyData was the only way to access group content). There was at one time a [https://df58.host.cs.st-andrews.ac.uk/yahoogroups/leaderboard leaderboard].
 
Volunteers who joined groups also made GetMyData requests from the accounts they used to join groups (in some cases, multiple requests, if they received results in time to continue joining groups or not all groups were included in the initial result). GMD requests could take up to 10 days to be processed; results were split into 2 GB ZIP files.
 
GMD results were emailed or rsynced to ArchiveTeam.
 
== Private groups of interest ==
 
[https://groups.yahoo.com/neo/groups/numberactivation/info numberactivation] (see all [https://reclaimthenet.org/ofcom-oftel-uk-phone-numbers-yahoo-groups/ the] [https://www.axios.com/yahoo-groups-ofcom-cell-phone-number-porting-51949f81-446e-4b4b-82eb-26790146e9a0.html press] [https://techupdatess.com/some-of-the-uks-phone-number-infrastructure-relies-on-yahoo-groups-the-verge/ coverage]; [https://www.whatdotheyknow.com/request/all_data_held_in_yahoo_groups_us FOI request]). Some external lists: [https://fanlore.org/wiki/Category:Yahoo!_Groups List of groups with Fanlore pages] (contains both private and public groups), [https://archivetransyahoo.noblogs.org/list-of-known-trans-groups/ Archive Trans Yahoo's list] (all private at last check), [https://yahoogroups.southasianamerican.org/ Archive South Asian American Yahoo Groups] (all public), and [https://queerdigital.com/ygpresproject Queer Digital History Project] (no groups listed, presumably all private).


== Statistics ==
== Statistics ==
Line 71: Line 164:
[[File:Yahoo_groups_post_date.png‎]]
[[File:Yahoo_groups_post_date.png‎]]


== Software for backups ==
== Site structure ==
* [http://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver], Sourceforge
 
There was a convenient JSON API, most endpoints of which are now down. Some endpoints require logged-in group membership or other permissions (depending on group settings).
 
===Groups===
 
* https://groups.yahoo.com/api/v1/search/groups (search)
:- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)
 
* https://groups.yahoo.com/api/v1/dir/categories/0/ (list of subcategories and discoverable groups under the root)
:- Known params: start, intlCode (au, in, sg, uk, us; ar, e1, es, mx; br; cf, fr; de; hk; it...)
:- Pagination: Page size is 10. Does ''not'' have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
: Groups are listed in fixed but arbitrary order. /0/ is a special value that shows the root node; subcategories can be accessed by using the subcategory id instead (the full "idList" value is not required).
: Defaults to the US view of the English directory tree. Different languages have different directory trees. Supplying a different intlCode parameter (list not exhaustive, must be lower case) accesses the corresponding view of the appropriate language's tree. Subcategory ids are language-specific and must be used with an appropriate intlCode. The intlCode -> language mapping may be checked at the /0/ endpoint; the root "name" is always "ROOT", but "id" is language-specific.<ref>This id can also be accessed with an appropriate intlCode, but contains the same twelve groups for all languages: the groups in the categories for musical artists "Roots, The" and "Rusted Root", three groups which appear to be Yahoo tests, and one group which appears to be a spam test.</ref> Different intlCode views of the same language list groups in a different order, may have slightly different category names, and appear to have slightly different numbers of categories in the full tree; their group overlap is about 99%.
: The "count" field appears totally inaccurate.
 
* https://groups.yahoo.com/api/v1/groups/concatenative/ (specific group information)
* https://groups.yahoo.com/api/v1/groups/concatenative/statistics (more group information, with partial overlap)
 
===Messages===
 
* https://groups.yahoo.com/api/v1/groups/concatenative/messages (list)
:- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
:- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).
 
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/ (specific message)
* https://groups.yahoo.com/api/v1/groups/concatenative/messages/1/raw (specific message, raw content including headers)
: Original email is largely recoverable from ''rawEmail'' field.
: Message headers and textual body parts have email addresses redacted, with the hosts replaced with "...". For example, "From: ceo@ford.com" and "From: ceo@toyota.com" both get turned into "From: ceo@..." Some addresses may not have been redacted correctly.
: Some messages may have encoding issues.<ref>https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean</ref> Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client. Removing non-ASCII characters and ^M characters from the 7-bit text should result in valid RFC822 emails.
: Some emails longer than 64kb (minus attachments) may be truncated. This truncation affects not just plain text, but also HTML and encoded Base64 content. Deleting the string "\n(Message over 64 KB, truncated)" from the end of the message part may help prevent parser breakage.
: All attachments are separated, with attachment bodies replaced with the string "[ Attachment content not displayed ]". Recovering the emails involves finding those MIME parts, looking at the filenames, comparing with the list of filenames listed in the "attachmentInfo" section, matching on similarity, and replacing the contents with the downloaded attachments. In very rare cases where a matching MIME section isn't found, it may be necessary to append those attachments as new MIME attachments to the email while reconstructing.
 
* https://groups.yahoo.com/api/v1/groups/concatenative/history (calendar summary)
:- Known params: ts, tz, chrome
:- Redundancy: Generatable from /messages data.
 
===Topics===
 
* https://groups.yahoo.com/api/v1/groups/concatenative/topics (list)
:- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
:- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.
 
* https://groups.yahoo.com/api/v1/groups/concatenative/topics/1 (specific topic)
:- Known params: maxResults.
:- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
:- Redundancy: Generatable from /messages data.
: "messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.
 
===Attachments===
 
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/attachments (list)
:- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
:- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).
 
* https://groups.yahoo.com/api/v1/groups/<groupname>/attachments/<attachmentId> (specific attachment)
Attachment may be of several types: photo, file, ...?
 
===Files===
 
* https://groups.yahoo.com/api/v2/groups/a_furrys_world/files (list)
:- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
:- Pagination: None.
: Entries with "type" 0 are files; 1, directories.
 
===Photos===
 
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/photos (list of photos)
:- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
:- Pagination: Page size defaults to 20, with no known limit.
: "totalPhotos" field in response gives total in group.
 
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums (list of albums)
:- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
:- Pagination: Page size defaults to 12, with no known limit.
: albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
 
* https://groups.yahoo.com/api/v3/groups/a_furrys_world/albums/1841906391 (specific album)
:- Known params: similar to /photos and /albums endpoints, with additional ordinal sortOrder option
: Photomatic albums ''must'' be loaded with the albumType parameter set to PHOTOMATIC.
 
===Links===
 
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/links (list)
:- Known params: linkdir
:- Pagination: None.
: linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).
 
===Polls===
 
* https://groups.yahoo.com/api/v1/groups/relationship-poll/polls (list)
:- Known params: count, start
:- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.
 
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/polls/3549106 (specific poll)
: Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.
 
===Databases===
 
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database (list of tables)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/ (specific table)
* https://groups.yahoo.com/api/v1/groups/a_furrys_world/database/1/records (table contents)
:- Pagination: None.
 
* https://groups.yahoo.com/neo/groups/groupname/database/1/records/export (export target)
:- Known params: format (CSV, TSV)
 
===Members===
 
* https://groups.yahoo.com/api/v1/groups/iswipe/members/confirmed (list of confirmed members)
:- Known params: count, start, sortBy, sortOrder, ts, tz, chrome
:- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
: May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
* https://groups.yahoo.com/api/v1/groups/iswipe/members/moderators (list of moderators)
* https://groups.yahoo.com/api/v1/groups/iswipe/members/bouncing (list of bouncing members)
* https://groups.yahoo.com/api/v1/groups/iswipe/members/suspended (list of suspended members)
: Very often (always?) blocked for normal members.
* https://groups.yahoo.com/api/v1/groups/iswipe/members/banned (list of banned members)
: Very often (always?) blocked for normal members.
 
===Events===
 
Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.
 
== Software for archiving groups ==
 
=== Python ===
 
* '''[https://github.com/ArchiveTeam/yahoo-group-archiver yahoo-group-archiver]''' scraped a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. Optionally, it could produce WARCs. ArchiveTeam's preferred tool and fully featured at the time of closure.
** [https://github.com/anirvan/yahoo-group-archive-tools Yahoo Group Archive Tools] (a Perl script) converts yahoo-group-archiver output into clean rfc822 and mbox files, with separated attachments correctly reattached, and many Yahoo truncation/redaction bugs corrected. It also turns list archives into PDF, using [https://github.com/andrewferrier/email2pdf email2pdf], which many non-technical list owners prefer.
 
* [https://github.com/andrewferguson/YahooGroups-Archiver YahooGroups-Archiver] is similar, but scraped only messages (not files or any other data). It has been deprecated in favor of the above.
 
* [https://github.com/csaftoiu/yahoo-groups-backup yahoo-groups-backup] scraped a group's messages and files (but not any other data) using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.
 
=== Other ===
 
* [http://www.personalgroupware.com/ PGOffline]: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Included attachments. Stores data in a SQLite database internally.
** [https://github.com/nsapa/pgo2mbox/ pgo2mbox] converts PGOffline pg4 files to mbox.
* [http://yahoogroupedia.pbworks.com/w/page/93006447/Chrome%20Application%20To%20Download%20Messages Yahoo Messages Export]: Chrome extension. Messages only. Saves as mbox.
* [https://sourceforge.net/projects/grabyahoogroup/ Yahoo Group Archiver]: Perl, defunct.
 
== Software for viewing archives (in mbox format) ==
 
* [https://www.thunderbird.net/ Mozilla Thunderbird]
** Method using addon: https://addons.thunderbird.net/en-US/thunderbird/addon/importexporttools-ng/
** Method without addon: https://commons.lbl.gov/display/~jwelcher@lbl.gov/Reading+an+mbox+file+with+Thunderbird
* [https://sylpheed.sraoss.jp/en/ Sylpheed]
** Instructions: https://sylpheed.sraoss.jp/doc/manual/en/sylpheed-15.html
** Detailed instructions from Doranwen: https://docs.google.com/document/d/1dXeXfY5Huri_8NTUn4hl-iUZq9MMRL1Qbo7bp5YZpmE/edit
* [https://kde.org/applications/internet/org.kde.kmail2 KMail]
* [http://www.mutt.org/ Mutt]
* [https://neomutt.org/ NeoMutt]
 
== Other archiving efforts ==
 
* Yahoo Groups Fandom Rescue Project
** https://discord.gg/DyCNddf
**archiver1.fandom@gmail.com
* Mods and Members
** https://modsandmembersblog.wordpress.com/
** https://mmsanctuary.groups.io/g/main
** https://twitter.com/featheredleader


== External Links ==
== External Links ==


* https://archive.org/details/yahoo_groups
* {{IA item|archiveteam_yahoogroups}}
 
== Coverage ==
 
* https://www.usatoday.com/story/tech/talkingtech/2019/10/17/yahoo-groups-online-forum-shutdown/4007150002/


== References ==
== References ==

Latest revision as of 20:23, 9 October 2022

Yahoo! Groups
Yahoo! Groups logo
Groups-yahoo-com.png
URL https://groups.yahoo.com/
Status Offline
Archiving status Partially saved
Archiving type Unknown
Project source yahoogroups-grab, yahoo-group-archiver
Project tracker yahoogroups, yahoo-groups-api
IRC channel #yahoosucks (on hackint)
Data? archiveteam_yahoogroups

Yahoo! Groups was Yahoo!'s combination mailing list service/web forum, the result of the acquisition of eGroups and some other Yahoo! stuff. In addition to archives of and a web interface for mailing lists, it offered file uploads, photo uploads, links, polls, and an events calendar. It had been stable since the late 90s, long enough for some specialised software to be developed to do backups of it. (Not many other websites can say that.) It was shuttered in stages over the course of 2019–2020.

Uploading of new content was disabled 2019-10-28, and all content, including message history, was made unavailable on 2019-12-14.[1] Group content was hidden from the web interface by 2019-12-21. After negative media attention, Yahoo announced that they were extending the deadline for users to use their official "GetMyData" export tool (which missed a plethora of attachments, databases, polls, photos, and metadata) to 2020-01-31.[2][3] They stopped accepting GMD requests on 2020-02-04.

Groups continued to function solely as mailing lists for a short period. However, the creation of new groups was disabled on 2020-10-12, and the web interface and mailing lists were shut down on 2020-12-16.[4]

Group admins and members, please see Yahoo! Groups/Archiving Project FAQ for group members or join IRC if you have questions.

The state of preservation

Yahoo Groups provenance.png

Summary

The remnants of Yahoo Groups information is split among several pieces, some of which are uploaded to IA, some of which will hopefully be publicly uploaded in the future, and some of which are not suitable to be made public and will presumably be kept darked or otherwise restricted on IA. The difficulty of sorting out group data between the last two categories is the chief obstacle to uploading what we have. There are also some privacy-ok files that never got uploaded, some of which we have, but some of which we have been unable to locate.

The ArchiveTeam capture of Yahoo groups took place in 4 parts:

Additionally, we worked closely with a "fandom project" that made its own GetMyData capture - which has already been shared with us insofar as is going to be shared - and received a few miscellaneous archived mailed in from various people.

Group Publicity

From our standpoint a group could be in one of the following categories with respect to privacy:

  • Public groups.
  • Groups we requested to join, whose join request was still pending and had not yet expired. (Obviously there are none of these left, but for reasons explained below this stage is of great importance.)
  • Groups we requested to join and got accepted to, either manually or automatically. ("Actively accepted")
  • Groups we requested to join and got rejected from ("Actively denied" or "denied")
  • Groups we requested to join, where the join request languished and expired after 2 weeks. ("Expired", "passively denied", or "timed out")

In any of the latter 3 cases the account requesting to join got an email. The privacy difficulty we have had with the GetMyData-derived archives arises from two points:

  • Many accounts used for that stage of the project were accounts at mail.com, which automatically deletes mailboxes after a period of inactivity, meaning the accept/deny/expire emails got deleted before we could gather them.
  • GetMyData incorrectly sent out GetMyData archives of groups to accounts that were in the "pending" stage of joining them.

The GetMyData process

After Yahoo closed the normal Groups interface as well as the API, there was only one avenue to continue to get information from it: "GetMyData", a process intended for people who were already in groups to get relevant records. ArchiveTeam, and parallel to it the fandom project, exploited this in order to try to get better coverage. "Get My Data files are a set of .zip files up to 2GB each. Each one has a variable number of groups in them (however many would fit in the 2GB). Each group has a messages .zip file with a number of .mbox files, a files .zip file with a backup of the group's file section, and a links .zip file with the group's links section. Yahoo unfortunately didn't bother sending other data in most cases, like attachments and photos, unless it was something the user requesting the backup personally posted".[5]

"The way Yahoo implemented their Get My Data utility, you would get backups for any group you were a member of and any group for which you had a pending request to join. So, we would occasionally get back data for groups that later actively denied the join request, and we would quite commonly get back data for groups that never responded, given that most groups were abandoned by 2019. This behavior was extremely consistent and applies to all private groups".[6]

Yahoo Groups

Forum/mailing list host that shut down in a process spanning late 2019 to early 2020.

2015-2018 API grab

Scrape of the Yahoo Groups API. Led, or perhaps done entirely, by PurpleSym. There is one WARC file per group, and several group-WARCs per IA item. The items are located in the IA collection archiveteam_yahoogroups, and are distinguishable from everything else in that collection by their upload date not later than 2018, and by their thumbnails being a photo of a "Yahoo!" sign (with the exception of something else from about 2 years later). Although they are in WARC format, they seem to use the resource record-type with synthetic URIs, meaning they will not work in the Wayback Machine.

Doranwen's metadata upload

An IA item located here created by the fandom project's Doranwen. Contains tables of group metadata, parsed from the 2015-2018 API grab and from their GetMyData collection.

Fandom GetMyData, not shared

Groups from the fandom project's GetMyData effort which were either "actively denied" or originated outside the normal GMD process, such as by a group member sending it to them. This also includes GMDs made by the fandom project on behalf of members of private groups, which it retains only as a backup in case the member loses their copy.[7] These have been kept with the fandom project.

Fandom GetMyData, shared

Groups from the fandom project's GetMyData effort which it decided were public enough to send to us, namely those which were public, accepted/approved, and "accepted" through the Groups permissions bug, as well as GMDs sent in by outside people to the fandom project[7]. A copy of these were sent to lennier1.

Fandom GetMyData

The fandom project conducted its own effort to collect GetMyData archives. Some of these the fandom project, acting by its own standards, has kept entirely private; some it has given to us. There was some overlap in Yahoo accounts (AKA IDs) with ArchiveTeam, possibly leading to mixing of data with our GetMyData process. [8]

2019 normal webpage grab

A regular, DPoS attempt at a grab of the Yahoo Groups website. As with many DPoS projects, navigation is somewhat broken in the WBM, but it does play back.[9]. Nonetheless the closest thing to sanity of any of the archiving attempts.

The repository is at ArchiveTeam/yahoogroups-grab. The data is on IA, in archiveteam_yahoogroups, prefixed simply by "Archive Team Yahoo! Groups", e.g. archiveteam_yahoogroups_20191214012211_24a04e36.

Combined GetMyData possessed by lennier1

A collection of GetMyData output files currently held by lennier1 and not uploaded anywhere. Contains the results from the ArchiveTeam GetMyData effort, as well as what the fandom project was willing to share with us. This with the addition of the data the fandom project didn't share with us would become the contents of Doranwen's GetMyData holdings.

Not all this data can be made public. A bug in Yahoo groups allowed execution of GetMyData on any restricted group merely by applying to join it, before being accepted or rejected. Additionally some of these files "were contributed by people who were ok making some groups (or data types) public but not others"[10]. As such the plan is to separate this out into a public segment and a darked segment before uploading both to IA.

Public ArchiveTeam GMD upload

A planned IA upload of GetMyDatas from the ones ArchiveTeam possesses, after it is sorted out which ones are public and private. As more than one group could fit into a GetMyData zip presumably the raw files we recieved cannot be uploaded; rather it will be necessary to extract the individual groups.

Darked ArchiveTeam GMD upload

A planned IA upload of GetMyDatas from the ones ArchiveTeam possesses, after it is sorted out which ones are public and private. Presumably just the zip files, but "darked", i.e. inaccessible to everything but privileged IA accounts (employees, and knowing them probably some other people as well).

PGOffline etc.

"[M]iscellaneous stuff like PGOffline data and people running the archiving program manually" sent into us, currently not uploaded, and held by lennier1. Presumably to be uploaded. As PGOffline did not suffer from the permissions bug presumably these are sufficiently privacy-safe.[11]

Upload of PGOffline to IA

Planned upload of data from PGOffline and other miscellaneous sources to IA.

Combined GetMyData posessed by Doranwen

Combination of the fandom project's public-access-ok and public-access-not-necessarily-ok GetMyData sets, as well as the ArchiveTeam one. This is the most comprehensive GetMyData collection there is; lennier1 has a version without the no-public access material. It appears this is eventually to be split up into a fairly processed public upload as well as a non-public set.

ArchiveTeam GetMyData

GetMyData archives collected by ArchiveTeam. Volunteers signed up for groups and then made GetMyData requests on the accounts; the results came by email, where they were sent to Marked. These are currently held by lennier1.

Doranwen's organized upload

The planned upload, by the fandom project leader Doranwen, of the GetMyData archives they possess, and wish to make public by their criteria (presumably a subset of what they've given to us). Much of the discussion in #yahoosucks in 2021 and 2022 has concerned cleaning up and categorizing this data, hence this page's label of it as "organized".

Private groups kept by Doranwen

The subset of the GetMyData archives possessed by Doranwen/the fandom project that they do not want to be made public. Indications are that these will be kept with people personally in perpetuity.

2019 API grab

Technically unusual Seesaw/tracker/DPoS project written and led by Marked to get data from the Groups API after the normal interface, and with it the DPoS project that gathered from it, had shut down. The GitHub repository is here. Marked has said that, of the data produced by this, "1/3 is in australia, 1/3 with me, and 1/3 on IA"[12]. As of September 2022 neither of the first two parts have been uploaded.

2019 API grab, portion with marked

Of the data from the 2019 API grab, "1/3" was with Marked. At some point between late 2020 and late 2022 this made its way to lennier1.

2019 API grab, portion with lennier1

Marked's 1/3 of the 2019 API grab data, sent to lennier1. Not yet uploaded; will hopefully be sent to IA eventually.

2019 API grab, portion on IA

Of the data from the 2019 API grab, "1/3" had been uploaded to IA in early 2020, and as of September 2022 that portion has remained unchanged. It is intended that the third originally with Marked and sent to lennier1, and if it can be located the third "in australia", be merged into this. These can be found in archiveteam_yahoogroups as items prefixed by "archiveteam_yahoogroups_api", e.g. archiveteam_yahoogroups_api_20191217011957_8a14e083.

2019 API grab, portion "in australia"

Of the data from the 2019 API grab, "1/3" was, per a few enigmatic remarks from Marked, "in australia". "[T]here was a volunteered target in Australia, I forgot their username atm"[13]. We have been unable to determine who this was, except that it seems unlikely to have been Kiska.

Submitting group data to the public archive

Were you a member of a public group (one that did not require administrator approval to join)? Were you an admin of a private group whose members consent to be part of the public archive? Did you save the group yourself, using GetMyData or any other method?

If so, we'd love to have your archives. Upload to a fileshare such as WeTransfer, Dropbox, Google Drive, or Mega.nz and email us a link.

Feel free to remove data which should remain private (such as private groups in mixed public/private GetMyData results, or message history from private groups whose members wish to make only files and photos public) before sending us a copy.

However, try to make sure the data is otherwise unmodified! In particular, there may be old malware in GMD ZIP files. Modern email software and operating systems are expected to be resistant to this old malware, but some antivirus software may see it and attempt to modify or delete the ZIP file. Please be careful of this!

Project history

Data collection for this project is over. Yahoo! Groups content is now inaccessible; although we continue to accept individual archives made by group members and admins, we can no longer archive additional groups.

While the project was active, volunteers could help in the following ways:

Nominating non-private groups for archival

Groups could be nominated for archival using this form. This was not used for groups that required administrator approval to join.

Submitting private groups for public archival

Administrators could request that their private group (we considered a private group to be one that required administrator approval of new members) be included in the public archive. We requested admins to ensure that the members of the group were happy about being part of the public archive.

To submit a group for archival, admins could send a membership invite to the email archiveteamprivateyahoogroup@gmail.com (without selecting the "Add only to mailing list" option). We monitored that email regularly to accept any membership requests we received, and scheduled the group for archival once our Yahoo account was a member.

Joining groups and submitting data

We used an extension for Chromium-based browsers to partially automate the process of joining groups (at first since some groups only made message history visible to members, and later because after the closure of the web interface GetMyData was the only way to access group content). There was at one time a leaderboard.

Volunteers who joined groups also made GetMyData requests from the accounts they used to join groups (in some cases, multiple requests, if they received results in time to continue joining groups or not all groups were included in the initial result). GMD requests could take up to 10 days to be processed; results were split into 2 GB ZIP files.

GMD results were emailed or rsynced to ArchiveTeam.

Private groups of interest

numberactivation (see all the press coverage; FOI request). Some external lists: List of groups with Fanlore pages (contains both private and public groups), Archive Trans Yahoo's list (all private at last check), Archive South Asian American Yahoo Groups (all public), and Queer Digital History Project (no groups listed, presumably all private).

Statistics

As of 2019-10-16 the directory lists 5619351 groups. 2752112 of them have been discovered. 1483853 (54%) have public message archives with an estimated number of 2.1 billion messages (1389 messages per group on average so far). 1.8 billion messages (86%) have been archived as of 2018-10-28.

The following graphs are slightly outdated:

Yahoo groups date created.png Yahoo groups messages per group.png Yahoo groups post date.png

Site structure

There was a convenient JSON API, most endpoints of which are now down. Some endpoints require logged-in group membership or other permissions (depending on group settings).

Groups

- Known params: maxHits, offset, query, sortBy (values: OLDEST, RELEVANCE, MEMBERS, LATEST_ACTIVITY, NEWEST)
- Known params: start, intlCode (au, in, sg, uk, us; ar, e1, es, mx; br; cf, fr; de; hk; it...)
- Pagination: Page size is 10. Does not have a count param. start is the result index, not the group id. start values 500 and up all return the same set of results.
Groups are listed in fixed but arbitrary order. /0/ is a special value that shows the root node; subcategories can be accessed by using the subcategory id instead (the full "idList" value is not required).
Defaults to the US view of the English directory tree. Different languages have different directory trees. Supplying a different intlCode parameter (list not exhaustive, must be lower case) accesses the corresponding view of the appropriate language's tree. Subcategory ids are language-specific and must be used with an appropriate intlCode. The intlCode -> language mapping may be checked at the /0/ endpoint; the root "name" is always "ROOT", but "id" is language-specific.[14] Different intlCode views of the same language list groups in a different order, may have slightly different category names, and appear to have slightly different numbers of categories in the full tree; their group overlap is about 99%.
The "count" field appears totally inaccurate.

Messages

- Known params: count, start, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 10, with no known limit. start is the message id, not the result index. sortOrder adjusts the order of results in the json response's array, whereas direction determines which way to iterate through ids from start (default: DESC, -1).
Original email is largely recoverable from rawEmail field.
Message headers and textual body parts have email addresses redacted, with the hosts replaced with "...". For example, "From: ceo@ford.com" and "From: ceo@toyota.com" both get turned into "From: ceo@..." Some addresses may not have been redacted correctly.
Some messages may have encoding issues.[15] Sometimes (as in the linked case) the non-raw endpoint has the correct characters, sometimes it does not; this is likely related to the originating email client. Removing non-ASCII characters and ^M characters from the 7-bit text should result in valid RFC822 emails.
Some emails longer than 64kb (minus attachments) may be truncated. This truncation affects not just plain text, but also HTML and encoded Base64 content. Deleting the string "\n(Message over 64 KB, truncated)" from the end of the message part may help prevent parser breakage.
All attachments are separated, with attachment bodies replaced with the string "[ Attachment content not displayed ]". Recovering the emails involves finding those MIME parts, looking at the filenames, comparing with the list of filenames listed in the "attachmentInfo" section, matching on similarity, and replacing the contents with the downloaded attachments. In very rare cases where a matching MIME section isn't found, it may be necessary to append those attachments as new MIME attachments to the email while reconstructing.
- Known params: ts, tz, chrome
- Redundancy: Generatable from /messages data.

Topics

- Known params: count, startTopicId, sortOrder (ASC, DESC), direction (1, -1)
- Pagination: Page size defaults to 25, with a limit of 100. sortOrder and direction as for messages.
- Known params: maxResults.
- Pagination: Page size defaults to 30 (messages in topic), with no known limit (maximum tested: 57). No known start param.
- Redundancy: Generatable from /messages data.
"messages" field is an array, each element of which seems to have the same contents as the corresponding /message/<id>/ (non-raw) endpoint; metadata ("totalMsgInTopic", "prevTopicId", "nextTopicId") could be reconstructed. Not known whether a message can fail to be associated with any topic.

Attachments

- Known params: count, start, sort (TITLE, TIME), order (ASC, DESC)
- Pagination: Page size defaults to 20, with no known limit (maximum tested: 93).

Attachment may be of several types: photo, file, ...?

Files

- Known params: sfpath (pass in a pathURI to retrieve the file listings of this subdirectory)
- Pagination: None.
Entries with "type" 0 are files; 1, directories.

Photos

- Known params: count, start, orderBy (MTIME), sortOrder (ASC, DESC), ownedByMe (TRUE, FALSE), lastFetchTime, photoFilter (ALL, PHOTOS_WITH_EXIF "Originals", PHOTOS_WITHOUT_EXIF "Shared")
- Pagination: Page size defaults to 20, with no known limit.
"totalPhotos" field in response gives total in group.
- Known params: count, start, albumType (PHOTOMATIC, NORMAL), orderBy (MTIME, TITLE), sortOrder (ASC, DESC)
- Pagination: Page size defaults to 12, with no known limit.
albumType defaults to NORMAL. PHOTOMATIC albumType requires the "READ" permission for "ATTACHMENTS". "total" field in response gives total number of albums of the selected type in group; however, this seems to have an off-by-one error for the NORMAL type of albums.
- Known params: similar to /photos and /albums endpoints, with additional ordinal sortOrder option
Photomatic albums must be loaded with the albumType parameter set to PHOTOMATIC.

Links

- Known params: linkdir
- Pagination: None.
linkdir takes the folder parameter from a dir. Nested folders should be joined with '/'. You need to keep track of the path to a given folder yourself (eg, linkdir + '/' + folder).

Polls

- Known params: count, start
- Pagination: Page size defaults to 10, with no known limit. There is no "total" field in the response.
Polls return all votes cast, non-anonymised, including identifying metadata for all viewers.

Databases

- Pagination: None.
- Known params: format (CSV, TSV)

Members

- Known params: count, start, sortBy, sortOrder, ts, tz, chrome
- Pagination: Page size defaults to 10, with a limit of 100. No known limit on total results.
May be blocked for normal members (as may all the other members endpoints). Includes moderators and bouncing members, with identifying metadata.
Very often (always?) blocked for normal members.
Very often (always?) blocked for normal members.

Events

Overlaps with Yahoo Calendar API, check yahoo-group-archiver code.

Software for archiving groups

Python

  • yahoo-group-archiver scraped a group using the JSON API and (for private endpoints) the two cookies Yahoo uses to verify a logged-in user. Optionally, it could produce WARCs. ArchiveTeam's preferred tool and fully featured at the time of closure.
    • Yahoo Group Archive Tools (a Perl script) converts yahoo-group-archiver output into clean rfc822 and mbox files, with separated attachments correctly reattached, and many Yahoo truncation/redaction bugs corrected. It also turns list archives into PDF, using email2pdf, which many non-technical list owners prefer.
  • YahooGroups-Archiver is similar, but scraped only messages (not files or any other data). It has been deprecated in favor of the above.
  • yahoo-groups-backup scraped a group's messages and files (but not any other data) using Selenium, storing message info and metadata (both rendered message body and raw email) into a Mongo database. It also provides a script to dump its data to static HTML pages that can be viewed in the browser.

Other

  • PGOffline: Windows, proprietary. 14-day free trial, after which download and export is disabled (but view still works). Included attachments. Stores data in a SQLite database internally.
    • pgo2mbox converts PGOffline pg4 files to mbox.
  • Yahoo Messages Export: Chrome extension. Messages only. Saves as mbox.
  • Yahoo Group Archiver: Perl, defunct.

Software for viewing archives (in mbox format)

Other archiving efforts

External Links

Coverage

References

  1. https://web.archive.org/web/20201126125219/https://help.yahoo.com/kb/groups/SLN31010.html
  2. https://www.theverge.com/2019/12/10/21004883/yahoo-groups-extend-deadline-download-data-date-time[IAWcite.todayMemWeb]
  3. https://twitter.com/YahooCare/status/1204312076379926528[IAWcite.todayMemWeb]
  4. https://help.yahoo.com/kb/groups/SLN35505.html[IAWcite.todayMemWeb]
  5. #yahoosucks, September 2022, two messages merged together here
  6. #yahoosucks, September 2022, two messages merged together here
  7. 7.0 7.1 #yahoosucks, October 2022, "OrIdow6: The only part of that that..."
  8. #yahoosucks, September 2022, search "i am sure that at least some ids whose info"
  9. "(even the 'html' yahoogroups-grab", parenthesis and single quotes in original, #yahoosucks, September 2022
  10. #yahoosucks, September 2022
  11. #yahoosucks, "The API archiving program...", September 2022
  12. Quoted by thuban in #yahoosucks September 2022; timestamp indicates Marked originally sent this in March 2020
  13. #yahoosucks, June 2021
  14. This id can also be accessed with an appropriate intlCode, but contains the same twelve groups for all languages: the groups in the categories for musical artists "Roots, The" and "Rusted Root", three groups which appear to be Yahoo tests, and one group which appears to be a spam test.
  15. https://yahoo.uservoice.com/forums/209451-us-groups/suggestions/9644478-displaying-raw-messages-is-not-8-bit-clean


v · t · e         Archive Team
Current events

Alive... OR ARE THEY · Deathwatch · Projects

Archiveteam.jpg
Archiving projects

APKMirror · Archive.is · BetaArchive · Government Backup (#datarefuge · ftp-gov· Gmane · Internet Archive · It Died · Megalodon.jp · OldApps.com · OldVersion.com · OSBetaArchive · TEXTFILES.COM · The Dead, the Dying & The Damned · The Mail Archive · UK Web Archive · WebCite · Vaporwave.me

Blogging

Blog.pl · Blogger · Blogster · Blogter.hu · Freeblog.hu · Fuelmyblog · Jux · LiveJournal · My Opera · Nolblog.hu · Open Diary · ownlog.com · Posterous · Powerblogs · Proust · Roon · Splinder · Tumblr · Vox · Weblog.nl · Windows Live Spaces · Wordpress.com · Xanga · Yahoo! Blog · Zapd

Cloud hosting/file sharing

aDrive · AnyHub · Box · Dropbox · Docstoc · Fast.io · Google Drive · Google Groups Files · iCloud · Fileplanet · LayerVault · MediaCrush · MediaFire · Mega · MegaUpload · MobileMe · OneDrive · Pomf.se · RapidShare · Ubuntu One · Yahoo! Briefcase

Corporations

Apple · IBM · Google · Loblaw · Lycos Europe · Microsoft · Yahoo!

Events

Arab Spring · Great Ape-Snake War · Spanish Revolution

Font Repos

DaFont · Google Web Fonts · GNU FreeFont · Fontspace

Forums/Message boards

4chan · Captain Luffy Forums · College Confidential · Discourse · DSLReports · ESPN Forums · Facepunch Forums · forums.starwars.com · HeavenGames · JamiiForums · Invisionfree · NeoGAF · Textream · The Classic Horror Film Board · Yahoo! Messages · Yahoo! Neighbors · Yuku.com · Zetaboards

Gaming

Atomicgamer · Bazaar.tf · City of Heroes · Club Nintendo · Clutch · Counter-Strike: Global Offensive · CS:GO Lounge · Desura · Dota 2 · Dota 2 Lounge · Emulation Zone · ESEA · GameBanana · GameMaker Sandbox · GameTrailers · Halo · Heroes of Newerth · HLTV.org · HQ Trivia · Infinite Crisis · joinDOTA · League of Legends · Liquipedia · Minecraft.net · Player.me · Playfire · Raptr · SingStar · Steam · SteamDB · SteamGridDB · Team Fortress 2 · TF2 Outpost · Warhammer · Xfire

Image hosting

500px · AOL Pictures · Blipfoto · Blingee · Canv.as · Camera+ · Cameroid · DailyBooth · Degree Confluence Project · DeviantART · Demotivalo.net · Flickr · Fotoalbum.hu · Fotolog.com · Fotopedia · Frontback · Geograph Britain and Ireland · Giphy · GTF Képhost · ImageShack · Imgh.us · Imgur · Inkblazers · Instagram · Kepfeltoltes.hu · Kephost.com · Kephost.hu · Kepkezelo.com · Keptarad.hu · Madden GIFERATOR · MLKSHK · Microsoft Clip Art · Microsoft Photosynth · Nokia Memories · noob.hu · Odysee · Panoramio · Photobucket · Picasa · Picplz · Pixiv · Portalgraphics.net · PSharing · Ptch · puu.sh · Rawporter · Relay.im · ScreenshotsDatabase.com · Sketch · Smack Jeeves · Snapjoy · Streetfiles · Tabblo · Tinypic · Trovebox · TwitPic · Wallbase · Wallhaven · Webshots · Wikimedia Commons

Knowledge/Wikis

arXiv · Citizendium · Clipboard.com · Deletionpedia · EditThis · Encyclopedia Dramatica · Etherpad · Everything2 · infoAnarchy · GeoNames · GNUPedia · Google Books (Google Books Ngram· Horror Movie Database · Insurgency Wiki · Knol · Lost Media Wiki · Neoseeker.com · Notepad.cc · Nupedia · OpenCourseWare · OpenStreetMap · Orain · Pastebin · Patch.com · Project Gutenberg · Puella Magi · Referata · Resedagboken · SongMeanings · ShoutWiki · The Internet Movie Database · TropicalWikis · Uncyclopedia · Urban Dictionary · Urban Exploration Resource · Webmonkey · Wikia · Wikidot · WikiHow · Wikkii · WikiLeaks · Wikipedia (Simple English Wikipedia· Wikispaces · Wikispot · Wik.is · Wiki-Site · WikiTravel · Word Count Journal

Magazines/Blogs/News

Cyberpunkreview.com · Game Developer Magazine · Gigaom · Hardware Canucks · Helium · JPG Magazine · Make Magazine · The Escapist · Polygamia.pl · San Fransisco Bay Guardian · Scoop · Regretsy · Yahoo! Voices

Microblogging

Heello · Identi.ca · Jaiku · Mommo.hu · Plurk · Sina Weibo · Tencent Weibo · Twitter · TwitLonger

Music/Audio

8tracks · AOL Music · Audimated.com · Cinch · digCCmixter · Dogmazic.net · Earbits · exfm · Free Music Archive · Gogoyoko · Indaba Music · Instacast · Instaudio · Jamendo · Last.fm · Music Unlimited · MOG · PureVolume · Reverbnation · ShareTheMusic · SoundCloud · Soundpedia · Spotify · This Is My Jam · TuneWiki · Twaud.io · WinAmp

People

Aaron Swartz · Michael S. Hart · Steve Jobs · Mark Pilgrim · Dennis Ritchie · Len Sassaman Project

Protocols/Infrastructure

FTP · Gopher · IRC · Usenet · World Wide Web
BitTorrent DHT

Q&A

Askville · Answerbag · Answers.com · Ask.com · Askalo · Baidu Knows · Blurtit · ChaCha · Experts Exchange · Formspring · GirlsAskGuys · Google Answers · Google Baraza · JustAnswer · MetaFilter · Quora · Retrospring · StackExchange · The AnswerBank · The Internet Oracle · Uclue · WikiAnswers · Yahoo! Answers

Recipes/Food

Allrecipes · Epicurious · Food.com · Foodily · Food Network · Punchfork · ZipList

Social bookmarking

Addinto · Backflip · Balatarin · BibSonomy · Bkmrx · Blinklist · BlogMarks · BookmarkSync · CiteULike · Connotea · Delicious · Designer News · Digg · Diigo · Dir.eccion.es · Evernote · Excite Bookmark · Faves · Favilous · folkd · Freelish · Getboo · GiveALink.org · Gnolia · Google Bookmarks · Hacker News · HeyStaks · IndianPad · Kippt · Knowledge Plaza · Licorize · Linkwad · Menéame · Microsoft Developer Network · myVIP · Mister Wong · My Web · Mylink Vault · Newsvine · Oneview · Pearltrees · Pinboard · Pocket · Propeller.com · Reddit · sabros.us · Scloog · Scuttle · Simpy · SiteBar · Slashdot · Squidoo · StumbleUpon · Twine · Voat · Vizited · Yummymarks · Xmarks · Yahoo! Buzz · Zootool · Zotero

Social networks

Bebo · BlackPlanet · Classmates.com · Cyworld · Dogster · Dopplr · douban · Ello · Facebook · Flixster · FriendFeed · Friendster · Friends Reunited · Gaia Online · Google+ · Habbo · hi5 · Hyves · iWiW · LinkedIn · Miiverse · mixi · MyHeritage · MyLife · Myspace · myVIP · Netlog · Odnoklassniki · Orkut · Plaxo · Qzone · Renren · Skyrock · Sonico.com · Storylane · Tagged · tvtag · Upcoming · Viadeo · Vine · VK · WeeWorld · Weibo · Wretch · Yahoo! Groups · Yahoo! Stars India · Yahoo! Upcoming · more sites...

Shopping/Retail

Alibaba · AliExpress · Amazon · Apple Store · Barnes & Noble · DirectCanada · eBay · Kmart · NCIX · Printfection · RadioShack · Sears · Sears Canada · Target · The Book Depository · ThinkGeek · Toys "R" Us · Walmart

Software/code hosting

Android Development · Alioth · Assembla · BerliOS · Betavine · Bitbucket · BountySource · Codecademy · CodePlex · Freepository · Free Software Foundation · GNU Savannah · GitHost  · GitHub · GitHub Downloads · Gitorious · Gna! · Google Code · ibiblio · java.net · JavaForge · KnowledgeForge · Launchpad · LuaForge · Maemo · mozdev · OSOR.eu · OW2 Consortium · Openmoko · OpenSolaris · Ourproject.org · Ovi Store · Project Kenai · RubyForge · SEUL.org · SourceForge · Stypi · TestFlight · tigris.org · Transifex · TuxFamily · Yahoo! Downloads

Television/Radio

ABC · Austin City Limits · BBC · CBC · CBS · Computer Chronicles · CTV · Fox · G4 · Global TV · Jeopardy! · NBC · NHK · PBS · Penn & Teller: Bullshit! · The Howard Stern Show · TV News Archive (Understanding 9/11)

Torrenting/Piracy

ExtraTorrent · EZTV · isoHunt · KickassTorrents · The Pirate Bay · Torrentz · Library Genesis

Video hosting

Academic Earth · Bambuser · Blip.tv · Epic · Freshlive · Google Video · Justin.tv · Mixer · Niconico · Nokia Trailers · Oddshot.tv · Periscope · Plays.tv · Qwiki · Skillfeed · Stickam · TED Talks · Ticker.tv · Twitch.tv · Ustream · Videoplayer.hu · Viddler · Viddy · Vidme · Vimeo · Vine · Vstreamers · Yahoo! Video · YouTube · Famous Internet videos (Me at the zoo)

Web hosting

Angelfire · Brace.io · BT Internet · CableAmerica Personal Web Space · Claranet Netherlands Personal Web Pages · Comcast Personal Web Pages · Extra.hu · FortuneCity · Free ProHosting · GeoCities (patch· Google Business Sitebuilder · Google Sites · Internet Centrum · MBinternet · MSN TV · Nifty · Nwnyet · Parodius Networking · Prodigy.net · Saunalahti Iso G · Swipnet · Telenor · Tripod · University of Michigan personal webpages · Verizon Mysite · Verizon Personal Web Space · Webs · Webzdarma · Virgin Media

Web applications

Mailman · MediaWiki · phpBB · Simple Machines Forum · vBulletin

Information

A Million Ways to Die on the Web · Backup Tips · Cheap storage · Collecting items randomly · Data compression algorithms and tools · Dev · Discovery Data · DOS Floppies · Fortress of Solitude · Keywords · Naughty List · Nightmare Projects · Rescuing floppy disks · Rescuing optical media · Site exploration · The WARC Ecosystem · Working with ARCHIVE.ORG

Projects

ArchiveCorps · Audit2014 · Emularity · Faceoff · FlickrFckr · Froogle · INTERNETARCHIVE.BAK (Internet Archive Census· IRC Quotes · JSMESS · JSVLC · Just Solve the Problem · NewsGrabber · Project Newsletter · Valhalla · Web Roasting (ISP Hosting · University Web Hosting· Woohoo

Tools

ArchiveBot · ArchiveTeam Warrior (Tracker· Google Takeout · HTTrack · Video downloaders · Wget (Lua · WARC)

Teams

Bibliotheca Anonoma · LibreTeam · URLTeam · Yahoo Video Warroom · WikiTeam

Other

800notes · AOL · Akoha · Ancestry.com · April Fools' Day · Amplicate · AutoAdmit · Bre.ad · Circavie · Cobook · Co.mments · Countdown · Discourse · Distill · Dmoz · Easel · Eircode · Electronic Frontier Foundation · FanFiction.Net · Feedly · Ficlets · Forrst · FunnyExam.com · FurAffinity · Google Helpouts · Google Moderator · Google Poly · Google Reader · ICQmail · IFTTT · Jajah · JuniorNet · Lulu Poetry · Mobile Phone Applications · Mochi Media · Mozilla Firefox · MyBlogLog · NBII · Newgrounds · Neopets · Quantcast · Quizilla · Salon Table Talk · Shutdownify · Slidecast · Stack Overflow · SOPA blackout pages · starwars.yahoo.com · TechNet · Toshiba Support · USA-Gov · Volán · Widgetbox · Windows Technical Preview · Wunderlist · YTMND · Zoocasa

About Archive Team

Introduction · Philosophy · Who We Are · Our stance on robots.txt · Why Back Up? · Software · Formats · Storage Media · Recommended Reading · Films and documentaries about archiving · Talks · In The Media · FAQ