Difference between revisions of "Elections/2021 UK elections"

From Archiveteam
Jump to navigation Jump to search
(Updated link to twitter scrape with smaller list, de-duplicated from previous (2019 UK general election) scrape)
(Added more detail on ongoing archiving efforts)
 
Line 11: Line 11:
After processing this data, the end result is as follows:
After processing this data, the end result is as follows:


* 3904 [https://web.archive.org/web/20210509115449/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/twitter-usernames.txt Twitter usernames] of political parties / candidates<br />[[User:Betamax]] ran a manual [https://github.com/JustAnotherArchivist/snscrape scrape] to grab the tweets (with media and outlinks). This produced a [https://web.archive.org/web/20210511192808/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/uk_elections_2021_twitter_scrape.txt list] (900MB in size) list containing 17.8 million URLs (13.1 million tweets plus outlinks and media links).
* 3904 [https://web.archive.org/web/20210509115449/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/twitter-usernames.txt Twitter usernames] of political parties / candidates<br />[[User:Betamax]] ran a manual [https://github.com/JustAnotherArchivist/snscrape scrape] to grab the tweets (with media and outlinks). This produced a [https://web.archive.org/web/20210511192808/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/uk_elections_2021_twitter_scrape.txt list] (900MB in size) list containing 17.8 million URLs (13.1 million tweets plus outlinks and media links).<br />The list was split up into chunks of 1 million URLs, which were archived through [[ArchiveBot]] (currently still ongoing): [https://archive.fart.website/archivebot/viewer/job/9wnj1 0] [https://archive.fart.website/archivebot/viewer/job/5wi7a 1] [https://archive.fart.website/archivebot/viewer/job/bcwtd 2] [https://archive.fart.website/archivebot/viewer/job/f1ve6 3] [https://archive.fart.website/archivebot/viewer/job/9u6cp 4] [https://archive.fart.website/archivebot/viewer/job/bni74 5] [https://archive.fart.website/archivebot/viewer/job/2aywt 6]
* 4175 [https://web.archive.org/web/20210509115426/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/facebook-url.txt Facebook pages]
* 4175 [https://web.archive.org/web/20210509115426/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/facebook-url.txt Facebook pages]<br />Rate limiting and banning by Facebook prevents any of these pages from being archived.
* 318 [https://web.archive.org/web/20210509115431/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/instagram-urls.txt Instagram profiles]
* 318 [https://web.archive.org/web/20210509115431/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/instagram-urls.txt Instagram profiles]<br />Rate limiting and banning by Facebook prevents any of these profiles from being archived.
* 89 [https://web.archive.org/web/20210509115454/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/youtube-channels.txt Youtube channels] (and [https://web.archive.org/web/20210509115358/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/dailymotion-channels.txt a Dailymotion channel]) used by parties / candidates
* 89 [https://web.archive.org/web/20210509115454/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/youtube-channels.txt Youtube channels] (and [https://web.archive.org/web/20210509115358/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/dailymotion-channels.txt a Dailymotion channel]) used by parties / candidates
* 530 [https://web.archive.org/web/20210509115444/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_sites.txt Political Party websites]<br />This has been further processed down to 522 websites: [https://web.archive.org/web/20210510150746/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_sites_reprocessed.txt reprocessed list].
* 530 [https://web.archive.org/web/20210509115444/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_sites.txt Political Party websites]<br />This has been further processed down to 522 websites: [https://web.archive.org/web/20210510150746/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_sites_reprocessed.txt reprocessed list].
* 1273 [https://web.archive.org/web/20210509115304/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/candidate_sites.txt Candidate Websites]<br />This has been further processed down to 1101 websites: [https://web.archive.org/web/20210510150714/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/candidate_sites_reprocessed.txt reprocessed list].
* 1273 [https://web.archive.org/web/20210509115304/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/candidate_sites.txt Candidate Websites]<br />This has been further processed down to 1101 websites: [https://web.archive.org/web/20210510150714/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/candidate_sites_reprocessed.txt reprocessed list].<br />Along with the Political Party websites (see above), these were processed as individual [[ArchiveBot]] jobs to avoid ArchiveBot's cross-linking / outlink handling from resulting in incomplete grabs.
* 2503 [https://web.archive.org/web/20210509115437/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_candidate_pages.txt Candidate Web pages] (these have been linked to separately and include manifestos, etc... - there will likely be a large overlap between these pages and the candidate websites, but I think it is best to archive these as well).<br />Completed as ArchiveBot Job ID 32g60eq89ir3jt7xm7htfrk81
* 2503 [https://web.archive.org/web/20210509115437/https://www.tardis.ed.ac.uk/~andrewferguson/uk_elections_2021_betamax/party_candidate_pages.txt Candidate Web pages] (these have been linked to separately and include manifestos, etc... - there will likely be a large overlap between these pages and the candidate websites, but I think it is best to archive these as well).<br />Completed as [[ArchiveBot]] Job ID [https://archive.fart.website/archivebot/viewer/job/32g60 32g60eq89ir3jt7xm7htfrk81]

Latest revision as of 15:46, 19 May 2021

The 2021 UK elections were not a general election, but were a series of elections held across the UK:

  • Council, Constabulary and Mayoral elections in England
  • Welsh Parliament elections in Wales
  • Scottish Parliament elections in Scotland

Work based upon Democracy Club information

User:Betamax collected and processed data from Democracy Club, a UK non-profit organisation with crowdsourced information on candidates and political parties for the elections. In particular, the Candidate Database was particularly helpful, although several other spreadsheets were also provided by members of Democracy Club.

After processing this data, the end result is as follows:

  • 3904 Twitter usernames of political parties / candidates
    User:Betamax ran a manual scrape to grab the tweets (with media and outlinks). This produced a list (900MB in size) list containing 17.8 million URLs (13.1 million tweets plus outlinks and media links).
    The list was split up into chunks of 1 million URLs, which were archived through ArchiveBot (currently still ongoing): 0 1 2 3 4 5 6
  • 4175 Facebook pages
    Rate limiting and banning by Facebook prevents any of these pages from being archived.
  • 318 Instagram profiles
    Rate limiting and banning by Facebook prevents any of these profiles from being archived.
  • 89 Youtube channels (and a Dailymotion channel) used by parties / candidates
  • 530 Political Party websites
    This has been further processed down to 522 websites: reprocessed list.
  • 1273 Candidate Websites
    This has been further processed down to 1101 websites: reprocessed list.
    Along with the Political Party websites (see above), these were processed as individual ArchiveBot jobs to avoid ArchiveBot's cross-linking / outlink handling from resulting in incomplete grabs.
  • 2503 Candidate Web pages (these have been linked to separately and include manifestos, etc... - there will likely be a large overlap between these pages and the candidate websites, but I think it is best to archive these as well).
    Completed as ArchiveBot Job ID 32g60eq89ir3jt7xm7htfrk81