SmolNet
The SmolNet consists of content available through alternative protocols outside the web such as gemini:// gopher:// Gopher+ gophers:// finger:// spartan:// text:// SuperText nex:// scorpion:// mercury:// titan:// guppy:// scroll:// molerat:// terse:// fsp://. There is a summary of the main SmolNet protocols. Related protocols include telnet://, nntp:// or news:// and dict://.
At this time the WARC format does not support these protocols and the WBM does not support them, so the SmolNet is not fully archivable nor can archives be accessed.
There are warc-specifications proposals for gemini:// and gopher:// WARC protocol additions.
There have been multiple Gopher archiving projects; John Goerzen wrote gopherbot, archived the 2006 and 2007 (mirrors on mozz.us, meulie.net) Gopherspace and blogged about it. In 2017 Ben Cartwright-Cox archived Gopherspace and blogged about building a Gopherspace search engine. The Raspberry Pi of Death gopher server (gopher://leveck.us) got saved to text files in 2018. preterhuman.net has a 2015 archive of several gopher sites.
There was a 2020 Geminispace crawl that used mozz-archiver to save gemini:// sites to archive.org items (parts 1 2 3), and the Delorean Time Machine is also archiving Geminispace.
In addition there are proxies to HTTP and HTML that can be used with ArchiveBot. The most prominent one is https://portal.mozz.us/ and it doesn't require JavaScript, but it only supports the gemini:// gopher:// finger:// spartan:// text:// nex:// protocols. There are several proxies that are only for gopher (usually running Gophernicus, often redirect to the proxied version of their corresponding gopher sites) are https://gopher.tildeverse.org/ https://gopher.envs.net/ https://gopherproxy.meulie.net/ https://gopher.mills.io/. There is a proxy from nex:// to gemini:// at gemini://gemini.lehmann.cx/cgi-bin/nex.php. There is a gemini proxy at https://skylarhill.me/x/ too. There is a gopher and gemini proxy at https://geminiproxy.p.projectsegfau.lt/.
The portal links to a few seed SmolNet sites and there are some pages on the SmolNet listing many more SmolNet sites which means that a lot of the SmolNet can be reached via the portal.mozz.us proxy.
Known lists of SmolNet sites include: ArchiveTeam Gopher page "SmolNet Portal", "Known Gemini Caspules", "Capsules in Lupa (crawler) database" "SuperTXT known_hosts", AuraGem Search "Scrollspace Index", AuraGem Search "Capsules", AuraGem Search "Mimetypes", "Hosts known to TLGS", "All the gopher servers (that we know of)", "The Observable Gopherspace Universe Project", triapul.cz more and probably others. Use these to find out if a SmolNet site you discovered is known, before adding it to the list below.
Unfortunately some sites do not allow their content to be downloaded via proxies, portal.mozz.us doesn't support Tor onion services and I2P services, and some sites are of course down, so a minority of sites just give errors. Some sites contain git commits, those are ignored in favour of Codearchiver/SWH. Since SmolNet folks are often data hoarders/packrats, there may be large archives of resources already saved via HTTP, those should be ignored when they are encountered.
Many of the finger:// sites don't link to their user pages, but just list them in plain text format, so those need to be extracted from the URL list and worked on manually. Some of them don't have a full list of users, but instead a list of recently active users, or a way to get a random user.
ArchiveBot job 84rmt67gwgaah8r70zqtjzw84 crawled some of the SmolNet (and outlinks to HTTP) via the portal.mozz.us proxy, but crashed and had to be restarted. The second try died due to a pipeline going away.
ArchiveBot job eab5n85lfcb4dj9wrznwihcal crawled all of finger://db.debian.org/ including all users (finger://db.debian.org/$user) (enumerated via LDAP) and their OpenPGP keys (finger://db.debian.org/$user/key) via the portal.mozz.us proxy. Outlinks were manually extracted and the single URLs archived in job eab5n85lfcb4dj9wrznwihcal.
Several other AB jobs crawled parts of the SmolNet via the portal.mozz.us proxy too.
The ArchiveBot job for https://skylarhill.me/ discovered the proxy there but links to it were ignored, so need to be extracted from the job log.
These additional sites were possibly not found through the portal.mozz.us proxy AB jobs due to various issues:
finger://thebackupbox.net/ring finger://tilde.pink/$user gemini://abyss.cinderblock.moe/ gemini://academia.fzrw.info/ gemini://dataswamp.org/ gemini://typed-hole.org gopher://dfdn.line.pm/ gopher://gopher.viste.fr/ gopher://sdf.org/1/users/julienxx gopher://tilde.32bit.cafe/1/~vivivi/gopher_in_practice guppy://guppy.000090000.xyz/ nex://nex.arkholt.com scorpion://zzo38computer.org/ scroll://auragem.letz.dev/ scroll://faris.mayvane.day/ scroll://letsdecentralize.org/ scroll://scholasticdiversity.us.to/ scroll://scrollprotocol.us.to/ spartan://spartan.arkholt.com:3000 ssh://supertxt.net/ terse://reports.frontline/aug2019austin text://textprotocol.org/
Should the WARC format ever add support for the SmolNet protocols, then all the URLs in the WBM to known SmolNet proxies could be useful for seeding a native recrawl of all known SmolNet sites, including those that block proxies to HTTP.