Difference between revisions of "The Mail Archive"

From Archiveteam
Jump to navigation Jump to search
(another relevant archive we should keep an eye on)
 
m (Reverted edits by Megalanya0 (talk) to last revision by Chip)
 
(6 intermediate revisions by 5 users not shown)
Line 2: Line 2:
| title = The Mail Archive
| title = The Mail Archive
| description =  
| description =  
| image = Mail-archive_com_Oct13-2015.jpeg
| URL = {{url|1=http://www.mail-archive.com|2=mail-archive.com}}
| URL = {{url|1=http://www.mail-archive.com|2=mail-archive.com}}
| project_status = {{online}}
| project_status = {{online}}
Line 7: Line 8:
}}
}}


'''The Mail Archive''' is what it sounds like; it's an ad-supported mailing list archive that users can add arbitrary mailing lists to. Started in 1998, it currently holds 117,561.072 archived postings on 4.517 mailing lists as of May 2015.
'''The Mail Archive''' is what it sounds like; it's an ad-supported mailing list archive that users can add arbitrary mailing lists to. Started in 1998, it currently holds 121,034,946 archived postings, on 4,314 mailing lists as of October 2015.
 
== Possible leads ==
[https://www.mail-archive.com/feeds/feeds.opml List of mailing lists (in OPML format)]
 
We could use this as a starting point by parsing the OPML to get a sitemap for future web-scraping.
 
Single line Bash to scrape OPML file for mailing list URLs:
    wget -qO- https://www.mail-archive.com/feeds/feeds.opml | \
    egrep -o "^\s*htmlUrl=\"([^\"]*)\"$" | sed 's/^[^"]*"//' | \
    sed 's/"$//'
 
 
{{Navigation box}}

Latest revision as of 15:54, 16 January 2017

The Mail Archive
Mail-archive com Oct13-2015.jpeg
URL mail-archive.com[IAWcite.todayMemWeb]
Status Online!
Archiving status Not saved yet
Archiving type Unknown
IRC channel #archiveteam-bs (on hackint)

The Mail Archive is what it sounds like; it's an ad-supported mailing list archive that users can add arbitrary mailing lists to. Started in 1998, it currently holds 121,034,946 archived postings, on 4,314 mailing lists as of October 2015.

Possible leads

List of mailing lists (in OPML format)

We could use this as a starting point by parsing the OPML to get a sitemap for future web-scraping.

Single line Bash to scrape OPML file for mailing list URLs:

   wget -qO- https://www.mail-archive.com/feeds/feeds.opml | \
   egrep -o "^\s*htmlUrl=\"([^\"]*)\"$" | sed 's/^[^"]*"//' | \
   sed 's/"$//'