From Archiveteam
Jump to navigation Jump to search

Maybe we should ask the US Government for that Twitter backup? --BlueMaxima 15:33, 24 January 2011 (UTC)

I don't believe they care enough. --ATrescue (talk) 01:50, 30 April 2019 (UTC)

Archiving tweet metadata as well.

Do the current methods of saving tweets (i.e. snscrape (see on GitHub)) also include metadata such as tweet source tags?[1][2]


The snscrape method puts the last 3200 (due to API limits) tweets of the archival target user into a URL list, uploaded to[IAWcite.todayMemWeb] or[IAWcite.todayMemWeb] and feeds it into ArchiveBot using !a < and also archives tweet replies because the URL list contains them as well.
ArchiveBot also saves it into the Wayback Machine, which is great.


Another archival method is chromebot: a or also chromebot: a (without “from:” when also including tweet replies and tweets mentioning the user in the thread).

But chromebot's infinite scroll might not reach as many tweets as snscrape, and even fewer from the targeted account when other tweets are in the search results as well.

In case of controversies, chromebot: a should be used as well.

Tweet Metadata

Because both chromebot and archivebot rely on Twitter's web interface, the amount of tweet metadata captured might be very limited.
Here are several documentations about Twitter's metadata API:

There should be a way to mass-grab this metadata as well. --ATrescue (talk) 01:49, 30 April 2019 (UTC)

snscrape grabs the entire history, not just the last 3200 tweets. But it doesn't include retweets. --JustAnotherArchivist (talk) 01:54, 10 May 2019 (UTC)

Twitter have changed its media section, it is now showing a grid of images, and when a tweet has multiple images, the grid only shows the first.

Twitter have changed its media section, it is now showing a grid of images, and when a tweet has multiple images, the grid only shows the first.

For anyone trying to archive twitter tweets via visiting twitter and logged in, using scripts to extract URLs loaded on the page, I want to inform people doing that extracting URLs this way won't extract 2nd-4th images when the tweet has multiple images, until you click on them and pressing left/right to make it appear in the HTML for it to be extracted by the script.

Also, twitter seems to be cracking down harder on scraping tweets and/or rate limits, with many of them resulting in "Something went wrong" (after twitter's loading screen) when a tweet is archived. When this happens, there is no indication on the WBM (will not say "job failed" or "please try again in X min") failing to save the page until you visited the playback version because the page itself did load, but not its contents (tweet itself).

A detour to save twitter is by using nitter front end, but in order to do so, a nitter instance must allow the WBM to save its pages (some instances aren't savable), and the instance must not be rate-limited. 403s and any form of anti-bot will not work. Unlike saving tweets directly on twitter, saving nitter will actually emit errors visible to the WBM so that the WBM will display its failed message without having to view the playback link. --HHHB (talk) 18:11, 10 December 2023 (UTC)

Death of Nitter

I'd think that nitter dying recently might have an impact on archiving:

Cooljeanius (talk) 01:52, 28 February 2024 (UTC)