Difference between revisions of "SmolNet"

From Archiveteam
Jump to navigation Jump to search
(a gemini site not in the observatory known hosts)
(many updates)
Line 1: Line 1:
The SmolNet consists of content available through alternative protocols outside the web such as [https://geminiprotocol.net/ gemini://] [https://en.wikipedia.org/wiki/Gopher_(protocol) gopher://] gophers:// [https://en.wikipedia.org/wiki/Finger_(protocol)  finger://] [https://portal.mozz.us/spartan/spartan.mozz.us/ spartan://] [https://textprotocol.org/ text://] [https://nightfall.city/nex/info/specification.txt nex://] [https://github.com/zzo38/scorpion scorpion://] [https://portal.mozz.us/gemini/zaibatsu.circumlunar.space/~solderpunk/gemlog/the-mercury-protocol.gmi mercury://] [https://portal.mozz.us/gemini/transjovian.org/page/Titan titan://] [https://github.com/dimkr/guppy-protocol guppy://] scroll://.
The SmolNet consists of content available through alternative protocols outside the web such as [https://geminiprotocol.net/ gemini://] [https://en.wikipedia.org/wiki/Gopher_(protocol) gopher://] [gopher://gopher.floodgap.com/0/gopher/tech/gopherplus.txt Gopher+] gophers:// [https://en.wikipedia.org/wiki/Finger_(protocol)  finger://] [https://portal.mozz.us/spartan/spartan.mozz.us/ spartan://] [https://textprotocol.org/ text://] [https://supertxt.net/00-intro.html SuperText] [https://nightfall.city/nex/info/specification.txt nex://] [https://github.com/zzo38/scorpion scorpion://] [https://portal.mozz.us/gemini/zaibatsu.circumlunar.space/~solderpunk/gemlog/the-mercury-protocol.gmi mercury://] [https://portal.mozz.us/gemini/transjovian.org/page/Titan titan://] [https://github.com/dimkr/guppy-protocol guppy://] scroll:// [https://molerat.trinket.icu/ molerat://] [https://github.com/runvnc/tersenet/blob/master/README.md terse://] [https://sourceforge.net/p/fsp/code/ci/master/tree/doc/PROTOCOL fsp://]. There is a [https://dbohdan.com/archive/scorpion/zzo38computer.org/smallweb.txt summary] of the main SmolNet protocols.


At this time the [[WARC]] format does not support these protocols and the WBM does not support them, so the SmolNet is not archivable nor can archives be accessed.
At this time the [[WARC]] format does not support these protocols and the WBM does not support them, so the SmolNet is not archivable nor can archives be accessed.


Fortunately there are proxies to HTTP and HTML that can be used. The most prominent one is https://portal.mozz.us/ and it doesn't require JavaScript, but it doesn't support the scorpion:// mercury:// titan:// guppy:// scroll:// protocols. Several proxies that are only for gopher and redirect to the proxied version of their corresponding gopher sites are https://gopher.tildeverse.org/ https://gopher.envs.net/
Fortunately there are proxies to HTTP and HTML that can be used. The most prominent one is https://portal.mozz.us/ and it doesn't require JavaScript, but it only supports the gemini:// gopher:// finger:// spartan:// text:// nex:// protocols. There are several proxies that are only for gopher (usually running Gophernicus, often redirect to the proxied version of their corresponding gopher sites) are https://gopher.tildeverse.org/ https://gopher.envs.net/ https://gopherproxy.meulie.net/  


The portal links to a few seed SmolNet sites and there are some pages on the SmolNet listing more SmolNet sites: [https://portal.mozz.us/gemini/kennedy.gemi.dev/observatory/known-hosts Known Gemini Caspules]
The portal links to a few seed SmolNet sites and there are some pages on the SmolNet listing many more SmolNet sites which means that a lot of the SmolNet can be reached via the portal.mozz.us proxy.


ArchiveBot job [https://archive.fart.website/archivebot/viewer/job/2024050700453584rmt 84rmt67gwgaah8r70zqtjzw84] is crawling most of the SmolNet (and outlinks to HTTP) via the portal.mozz.us proxy. Unfortunately some sites do not allow their content to be downloaded via proxies, and some sites are of course down, so a minority of sites just give errors. Some sites contain git commits, those are ignored in favour of [[Codearchiver]]/SWH. Since SmolNet folks are often data hoarders/packrats, there may be large archives of resources already saved via HTTP, those should be ignored when they are encountered. These additional sites are either not available or were possibly not found through the portal.mozz.us proxy AB job:
Known lists of SmolNet sites include: [https://portal.mozz.us/ "SmolNet Portal"], [https://portal.mozz.us/gemini/kennedy.gemi.dev/observatory/known-hosts "Known Gemini Caspules"], [https://supertxt.net/.ssh/known_hosts "SuperTXT known_hosts"] and probably others.
 
Unfortunately some sites do not allow their content to be downloaded via proxies, and some sites are of course down, so a minority of sites just give errors. Some sites contain git commits, those are ignored in favour of [[Codearchiver]]/SWH. Since SmolNet folks are often data hoarders/packrats, there may be large archives of resources already saved via HTTP, those should be ignored when they are encountered.  
 
ArchiveBot job [https://archive.fart.website/archivebot/viewer/job/2024050700453584rmt 84rmt67gwgaah8r70zqtjzw84] is crawling the SmolNet (and outlinks to HTTP) via the portal.mozz.us proxy.
 
ArchiveBot job [https://archive.fart.website/archivebot/viewer/job/eab5n85lfcb4dj9wrznwihcal eab5n85lfcb4dj9wrznwihcal] crawled all of finger://db.debian.org/ including all users (finger://db.debian.org/$user) and their OpenPGP keys (finger://db.debian.org/$user/key) via the portal.mozz.us proxy. Outlinks were manually extracted and the single URLs archived in job [https://archive.fart.website/archivebot/viewer/job/eab5n85lfcb4dj9wrznwihcal eab5n85lfcb4dj9wrznwihcal].
 
These additional sites were possibly not found through the portal.mozz.us proxy AB jobs due to various issues:


<pre>
<pre>
gemini://abyss.cinderblock.moe/
gemini://abyss.cinderblock.moe/
scorpion://zzo38computer.org/specification.txt
gemini://gemini.ctrl-c.club/
guppy://guppy.000090000.xyz
scorpion://zzo38computer.org/
guppy://guppy.000090000.xyz/
finger://tilde.pink/$user
finger://tilde.pink/$user
finger://db.debian.org/$user
finger://db.debian.org/$user/key
gopher://aussies.space/1/~freet/
gopher://aussies.space/1/~freet/
gopher://gopher.viste.fr/
gopher://gopher.viste.fr/
Line 22: Line 29:
gopher://si3t.ch/
gopher://si3t.ch/
gopher://thunix.net/
gopher://thunix.net/
gopher://rak.ac
text://textprotocol.org/
ssh://supertxt.net/
terse://reports.frontline/aug2019austin
</pre>
</pre>



Revision as of 03:50, 7 August 2024

The SmolNet consists of content available through alternative protocols outside the web such as gemini:// gopher:// Gopher+ gophers:// finger:// spartan:// text:// SuperText nex:// scorpion:// mercury:// titan:// guppy:// scroll:// molerat:// terse:// fsp://. There is a summary of the main SmolNet protocols.

At this time the WARC format does not support these protocols and the WBM does not support them, so the SmolNet is not archivable nor can archives be accessed.

Fortunately there are proxies to HTTP and HTML that can be used. The most prominent one is https://portal.mozz.us/ and it doesn't require JavaScript, but it only supports the gemini:// gopher:// finger:// spartan:// text:// nex:// protocols. There are several proxies that are only for gopher (usually running Gophernicus, often redirect to the proxied version of their corresponding gopher sites) are https://gopher.tildeverse.org/ https://gopher.envs.net/ https://gopherproxy.meulie.net/

The portal links to a few seed SmolNet sites and there are some pages on the SmolNet listing many more SmolNet sites which means that a lot of the SmolNet can be reached via the portal.mozz.us proxy.

Known lists of SmolNet sites include: "SmolNet Portal", "Known Gemini Caspules", "SuperTXT known_hosts" and probably others.

Unfortunately some sites do not allow their content to be downloaded via proxies, and some sites are of course down, so a minority of sites just give errors. Some sites contain git commits, those are ignored in favour of Codearchiver/SWH. Since SmolNet folks are often data hoarders/packrats, there may be large archives of resources already saved via HTTP, those should be ignored when they are encountered.

ArchiveBot job 84rmt67gwgaah8r70zqtjzw84 is crawling the SmolNet (and outlinks to HTTP) via the portal.mozz.us proxy.

ArchiveBot job eab5n85lfcb4dj9wrznwihcal crawled all of finger://db.debian.org/ including all users (finger://db.debian.org/$user) and their OpenPGP keys (finger://db.debian.org/$user/key) via the portal.mozz.us proxy. Outlinks were manually extracted and the single URLs archived in job eab5n85lfcb4dj9wrznwihcal.

These additional sites were possibly not found through the portal.mozz.us proxy AB jobs due to various issues:

gemini://abyss.cinderblock.moe/
gemini://gemini.ctrl-c.club/
scorpion://zzo38computer.org/
guppy://guppy.000090000.xyz/
finger://tilde.pink/$user
gopher://aussies.space/1/~freet/
gopher://gopher.viste.fr/
gopher://gopher.linuxgalaxy.org/
gopher://occ.deadnet.se/
gopher://si3t.ch/
gopher://thunix.net/
gopher://rak.ac
text://textprotocol.org/
ssh://supertxt.net/
terse://reports.frontline/aug2019austin

Should the WARC format ever add support for the SmolNet protocols, then the AB job could be useful for seeding a native recrawl of all known SmolNet sites, including those that block proxies to HTTP.

The SmolNet has the Delorean Time Machine for archiving Geminispace and there is a 2007 mirror of Gopherspace.