SmolNet

From Archiveteam
Revision as of 13:46, 14 August 2024 by PaulWise (talk | contribs) (some updates)
Jump to navigation Jump to search

The SmolNet consists of content available through alternative protocols outside the web such as gemini:// gopher:// Gopher+ gophers:// finger:// spartan:// text:// SuperText nex:// scorpion:// mercury:// titan:// guppy:// scroll:// molerat:// terse:// fsp://. There is a summary of the main SmolNet protocols.

At this time the WARC format does not support these protocols and the WBM does not support them, so the SmolNet is not archivable nor can archives be accessed.

Fortunately there are proxies to HTTP and HTML that can be used. The most prominent one is https://portal.mozz.us/ and it doesn't require JavaScript, but it only supports the gemini:// gopher:// finger:// spartan:// text:// nex:// protocols. There are several proxies that are only for gopher (usually running Gophernicus, often redirect to the proxied version of their corresponding gopher sites) are https://gopher.tildeverse.org/ https://gopher.envs.net/ https://gopherproxy.meulie.net/ https://gopher.mills.io/. There is a proxy from nex:// to gemini:// at gemini://gemini.lehmann.cx/cgi-bin/nex.php.

The portal links to a few seed SmolNet sites and there are some pages on the SmolNet listing many more SmolNet sites which means that a lot of the SmolNet can be reached via the portal.mozz.us proxy.

Known lists of SmolNet sites include: "SmolNet Portal", "Known Gemini Caspules", "SuperTXT known_hosts", "Scrollspace Index", "All the gopher servers (that we know of)", "The Observable Gopherspace Universe Project", triapul.cz more and probably others. Use these to find out if a SmolNet site you discovered is known, before adding it to the list below.

Unfortunately some sites do not allow their content to be downloaded via proxies, portal.mozz.us doesn't support Tor onion services and I2P services, and some sites are of course down, so a minority of sites just give errors. Some sites contain git commits, those are ignored in favour of Codearchiver/SWH. Since SmolNet folks are often data hoarders/packrats, there may be large archives of resources already saved via HTTP, those should be ignored when they are encountered.

Many of the finger:// sites don't link to their user pages, but just list them in plain text format, so those need to be extracted from the URL list and worked on manually. Some of them don't have a full list of users, but instead a list of recently active users, or a way to get a random user.

ArchiveBot job 84rmt67gwgaah8r70zqtjzw84 crawled some of the SmolNet (and outlinks to HTTP) via the portal.mozz.us proxy, but crashed and had to be restarted.

ArchiveBot job eab5n85lfcb4dj9wrznwihcal crawled all of finger://db.debian.org/ including all users (finger://db.debian.org/$user) (enumerated via LDAP) and their OpenPGP keys (finger://db.debian.org/$user/key) via the portal.mozz.us proxy. Outlinks were manually extracted and the single URLs archived in job eab5n85lfcb4dj9wrznwihcal.

Several other AB jobs crawled parts of the SmolNet via the portal.mozz.us proxy too.

These additional sites were possibly not found through the portal.mozz.us proxy AB jobs due to various issues:

finger://thebackupbox.net/ring
finger://tilde.pink/$user
gemini://abyss.cinderblock.moe/
gemini://academia.fzrw.info/
gemini://dataswamp.org/
gemini://typed-hole.org
gopher://dfdn.line.pm/
gopher://gopher.viste.fr/
gopher://sdf.org/1/users/julienxx
gopher://tilde.32bit.cafe/1/~vivivi/gopher_in_practice
guppy://guppy.000090000.xyz/
nex://nex.arkholt.com
scorpion://zzo38computer.org/
scroll://auragem.letz.dev/
scroll://faris.mayvane.day/
scroll://letsdecentralize.org/
scroll://scholasticdiversity.us.to/
scroll://scrollprotocol.us.to/
spartan://spartan.arkholt.com:3000
ssh://supertxt.net/
terse://reports.frontline/aug2019austin
text://textprotocol.org/

Should the WARC format ever add support for the SmolNet protocols, then all the URLs in the WBM to known SmolNet proxies could be useful for seeding a native recrawl of all known SmolNet sites, including those that block proxies to HTTP.

The SmolNet has the Delorean Time Machine for archiving Geminispace and there is a 2007 mirror of Gopherspace (on mozz.us, meulie.net, IA).