Mozilla Addons
Mozilla Addons | |
![]() | |
URL | https://addons.mozilla.org/ |
Status | Special case |
Archiving status | Saved! (addon files) Saved! (website) |
Archiving type | Unknown |
IRC channel | #outofammo (on hackint) |
Project lead | User:JustAnotherArchivist |
Data[how to use] | #Archival |
Mozilla Addons, also known as AMO (from its domain, addons.mozilla.org), is a website run by the Mozilla Foundation which hosts extensions and themes for Firefox, Thunderbird, and other Mozilla software.
Extensions used to be based on XPI until the introduction of WebExtensions around 2016. Since Firefox 57, only WebExtensions are supported. XPI-based addons (called "legacy") were deprecated but still supported until the end-of-life of Firefox 52 ESR in September 2018. The legacy addons were planned to be removed from AMO in early October 2018[1][2].
Website structure
As of September 2018, there are two different versions of AMO: the old version, called "classic desktop" on the website, and a redesigned new site. The two mostly serve the same content; the most important difference is that the new site does not serve user profile pages for non-developers while the old site does. The switching between the two sites happens through a cookie called mamo
(modern AMO?); when it is set to off
, the old site is served; when it's on
or unset, the new site is served.
AMO uses numeric IDs and slugs for addon identification. (GUIDs are also used, but only in the API and internally in Firefox.) These IDs are shared with Thunderbird and Seamonkey addons, which used to be hosted on AMO but have since been moved to addons.thunderbird.net (which only exists in the "old" form; there is a "view the new site" link in the footer, but it doesn't have any effect as of 2018-09-30).
To track addon installations, AMO uses a src
parameter everywhere on the site. There are at least 59 possible values for this parameter[3].
Addon download links have the general format https://addons.mozilla.org/firefox/downloads/file/$FILEID/$FILENAME?src=$SRC
. Note that file IDs are separate from addon and version IDs. The filename typically contains the slug and a version identifier. When AMO detects that you're using a version of Firefox that is incompatible with an addon, it displays a "download anyway" link, which in addition contains a type:attachment
path segment between the file ID and the filename (i.e. .../file/$FILEID/type:attachment/$FILENAME...
). All download URLs redirect to a CDN at addons.cdn.mozilla.net; the type:attachment
is also reflected in that CDN URL as _attachments
(which then inserts a Content-Disposition
header); the src
parameter is not included in the redirect target.
Besides the actual addon files, AMO also hosts preview screenshots, reviews, version history (including changelogs), statistics, and in some cases additional pages (e.g. privacy policy) for each addon. The review page only displays the most recent review of any particular user, and one needs to follow an extra link to discover a user's earlier reviews for the same addon.
Note that AMO does not only host extensions but also themes. These consist simply of a JSON object which provides the URLs for the relevant images and some additional settings (e.g. text colour), i.e. there is no real download for them.
The AMO API versions 3 and 4 are documented here and here, respectively.
Utilities
- amo-links-getter: Both Wget and the Warrior are ineffective in downloading the site completely (besides there are many redundant links that are not taken into account as redirects causing the same content to be downloaded several times). This is a set of scripts that store all the links in a SQLite database to be downloaded later.
- Here's a list of discovered links (direct download link); see the discussion page on how to download it with Wget.
Archival
- There were two (proper) attempts to archive AMO through ArchiveBot. job:4aa66jgox1pg1gp6gxzkgthiq ran from 2017-08-29 until early December 2017, and job:xew9sjj59osltx5oyjr6n9rg was started on 2018-07-29 and vanished sometime in August 2018.
- All addon files (both from AMO for Firefox/Firefox Android and from addons.thunderbird.net for Thunderbird/Seamonkey) were downloaded by User:JustAnotherArchivist between 2018-09-14 and 2018-09-16.
- The amo-links-getter list linked above was downloaded through ArchiveBot as job:akifc65k7kfhpdhfbveh79v1c (started on 2018-09-30, finished on 2018-10-07).
- The old, "classic desktop" AMO website was grabbed by User:JustAnotherArchivist in October/November 2018.
- The website – minus downloads and
src
parameter variations, but including version history, reviews, and API data – was grabbed between 2018-09-30 and 2018-10-13 (see below for details). - The
src
parameter variations and downloads as well as addon collections were grabbed between 2018-10-15 and 2018-10-20 (see below for details). - A wpull grab of the skeleton of the old website (with some special handling of the locale variations in the URLs) was done between 2018-10-15 and 2018-10-19. "Skeleton" here means the categories, tags, etc.; the addons themselves as well as user profiles are excluded.
- Specifically, case variations of
/en-US/
are normalised to this capitalisation. There is some bug in AMO which leads to links using/en-us/
,/eN-uS/
, etc. Unfortunately, this means that some links will be broken, but that's unavoidable without retrieving the entire site 16 times... - Any URLs with a path starting with
/en-US/(firefox|android)/(addon|user)/
or/(firefox|android)/downloads/
as well as all locales other than en-US (af, ar, bg, ...) are ignored. - In the search, combinations between the filters on the left or with the sorting are ignored.
- Specifically, case variations of
- All of this data can be found on the Internet Archive at addons.mozilla.org_legacy_201810.
- The website – minus downloads and
- A warrior project for the website was in the works (repository) but never active.
JustAnotherArchivist's website grab, part 1
General notes:
- Any URL starting with
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/
redirects to a URL using the slug instead. Only theADDONID
URLs are listed below for brevity, but of course the redirect target with the slug was also grabbed in all cases. - For all API resources, both the v3 and the v4 version was retrieved, but only the v3 URL is given below for brevity. Unless otherwise noted, you can simply replace
v3
withv4
in those URLs to get the v4 URL.
For all addon IDs between 0 and 1009999 (largest existing ID as of 2018-10-13 is 1003947), these URLs are covered:
- addon detail API endpoint (
https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/
) - addon page (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/
)- This URL may redirect to addons.thunderbird.net for Thunderbird addons. In that case, all redirects on addons.mozilla.org are kept, but the addons.thunderbird.net page itself is not grabbed, and the addon is ignored.
- If this URL returns a 404 or another error (e.g. disabled addon), the addon is ignored.
- the "more" subpage which is loaded through JavaScript (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/more
, must be requested with the headerX-Requested-With: XMLHttpRequest
) - the addon-specific images, i.e. icons (in both resolutions, 32x32 px and 64x64 px) and preview images (thumbnail and full resolution), extracted from both the page and the API response (just to be sure)
- addon detail API endpoint with the slug and/or the GUID instead of the addon ID if possible (i.e. if the slug and/or GUID could be determined)
- version history
- initial page (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/
) - pagination (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/?page=N
; page=1 always retrieved even if there is no pagination) - API endpoint (
https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/
andhttps://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/?page=1
+ all following pages until thenext
field is empty/null)
- initial page (
- versions
- API endpoint for each version (
https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/versions/VERSIONID/
, where the version IDs were collected from the API history pagination) - page redirect for each version (
https://addons.mozilla.org/en-US/firefox/addon/SLUG/versions/VERSIONSTRING
, collected during the pagination traversal on the website)
- API endpoint for each version (
- reviews/ratings
- initial page + pagination as described above for the version history (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/[?page=N]
) - API endpoint including further pages according to the
next
field (https://services.addons.mozilla.org/api/v3/reviews/review/?addon=ADDONID
andhttps://services.addons.mozilla.org/api/v4/ratings/rating/?addon=ADDONID
) - API endpoint for each version of the addon + further pages according to
next
(https://services.addons.mozilla.org/api/v3/reviews/review/?addon=ADDONID&version=VERSIONID
) - individual review page (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/REVIEWID/
) - individual review API endpoint (
https://services.addons.mozilla.org/api/v3/reviews/review/REVIEWID/
andhttps://services.addons.mozilla.org/api/v4/ratings/rating/REVIEWID/
) - page(s) for users who wrote multiple reviews for an addon (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/reviews/user:USERID
; also pagination with?page=N
if available, though that doesn't seem to be the case anywhere)
- initial page + pagination as described above for the version history (
- statistics
- page (
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/statistics/
) - data (
https://addons.mozilla.org/en-US/firefox/addon/SLUG/statistics/DATASET-day-YEAR0101-YEAR1231.json
)- Here,
DATASET
was each of('overview', 'apps', 'locales', 'os', 'versions', 'statuses', 'sources', 'downloads')
, andYEAR
started from 2018 and went back until the returned data was empty.
- Here,
- page (
- any other subpage of the addon which is linked on the addon page and starts with
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/
orhttps://addons.mozilla.org/en-US/firefox/addon/SLUG/
, e.g. privacy policy - feature compatibility API endpoint (
https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/feature_compatibility/
) - EULA and privacy policy API endpoint (
https://services.addons.mozilla.org/api/v3/addons/addon/ADDONID/eula_policy/
)
Furthermore, during the relevant stages above (addon page, "more", addon detail API endpoint, and reviews pages and API endpoints), usernames were extracted, and the user profiles were afterwards retrieved as well:
- user profile page using the username (
https://addons.mozilla.org/en-US/firefox/user/USERNAME/
) - if it can be found on that page, the same thing with the user ID (
https://addons.mozilla.org/en-US/firefox/user/USERID/
; the abuse report button is used for extracting the user ID) - avatar if provided (somewhere under
https://addons.cdn.mozilla.net/user-media/userpics/
) - pagination for reviews, if necessary (
https://addons.mozilla.org/en-US/firefox/user/USERNAME/?page=N
andhttps://addons.mozilla.org/en-US/firefox/user/USERID/?page=N
)
JustAnotherArchivist's website grab, part 2
This grab covers the variations of the src
URL parameter on the addon page and the downloads themselves with that parameter. It again operates on addon IDs. It also covers collections.
src variations and downloads
- For each addon ID, it's checked whether the addon needs to be processed in this way. This could've been integrated into part 1, but it's tricky and time-consuming to do these checks after the fact, so we simply reretrieve the API addon detail endpoint. Inexistent and theme addons are skipped; note that themes do not use the
src
tracking parameter since their installation works very differently and there are no downloadable files either, so everything below is unnecessary for them. - For each variation of
src
,https://addons.mozilla.org/en-US/firefox/addon/ADDONID/?src=SRC
is retrieved.SRC
is empty or one of the 58 values listed in the documentation with the exception ofcollection
andversion-history
; the former is handled below, and the latter is only used on the version history page but not on links to the addon page. (version-history
is implicitly handled below.) - The version history page(s) are retrieved as described in part 1:
https://addons.mozilla.org/en-US/firefox/addon/ADDONID/versions/[?page=N]
- From all of the above pages, download links are collected. There are a few different formats:
https://addons.mozilla.org/firefox/downloads/latest/SLUG/addon-ADDONID-latest.EXT?src=SRC
– this is used by the install button at the top of the addon page and also on other pages (e.g. category listings).https://addons.mozilla.org/firefox/downloads/file/FILEID/FILE.EXT?src=SRC
– this appears in the version information at the bottom of the addon page and in the version history.- For both of these formats, there exist also URLs containing a
type:attachment
path segment. These are "download anyway" links for when a browser is incompatible with an addon version. - All four URLs are actually redirects to the CDN; the
src
parameter is fortunately not passed on to the CDN, so only two requests to the CDN (for the presence and absence oftype:attachment
) are necessary. The file is identical in both cases; the only difference is aContent-Disposition
header to force a download.
Collections
Collection retrieval operates on users and is based on the users discovered in part 1 (i.e. covers all addon developers and reviewers).
- The list of collections by a user is retrieved:
https://addons.mozilla.org/en-US/firefox/collections/USERNAME/[?page=N]
- Each collection:
https://addons.mozilla.org/en-US/firefox/collections/USERNAME/COLLSLUG/[?page=N]
- Each addon page linked from the collection and containing a
src
parameter is retrieved; this covers URLs such ashttps://addons.mozilla.org/en-US/firefox/addon/decentraleyes/?src=collection&collection_id=4a02c848-8be7-44ff-bc1c-f1c2d8dddf86
from this collection. - For each download link appearing in either the collection or on the addon page, the redirect to the CDN is retrieved (but not followed).