mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,

It would be very nice to have recognition of the Gemini protocol when parsing URIs.

Here is my guess at how it could go.

$ git diff diff --git a/lib/URI/Heuristic.pm b/lib/URI/Heuristic.pm index 1780f34..4694813 100644 --- a/lib/URI/Heuristic.pm +++ b/lib/URI/Heuristic.pm @@ -155,7 +155,7 @@ sub uf_uristr ($) if (/^(www|web|home)[a-z0-9-]*(?:\.|$)/i) { $_ = "http://$_"; - } elsif (/^(ftp|gopher|news|wais|https|http)[a-z0-9-]*(?:\.|$)/i) + { + } elsif (/^(ftp|gemini|gopher|news|wais|https|http)[a-z0-9-]*(?:\ +.|$)/i) { $_ = lc($1) . "://$_"; } elsif ($^O ne "MacOS" && @@ -168,7 +168,7 @@ sub uf_uristr ($) } elsif ($^O eq "MacOS" && m/:/) { # potential MacOS file name - unless (m/^(ftp|gopher|news|wais|http|https|mailto):/) { + unless (m/^(ftp|gemini|gopher|news|wais|http|https|mailto):/) +{ require URI::file; my $a = URI::file->new($_)->as_string; $_ = ($a =~ m/^file:/) ? $a : "file:$a";
and in lib/URI/gemini.pm it could be something like this:
package URI::gemini; # https://geminiprotocol.net/ use strict; use warnings; our $VERSION = '0.1'; use parent 'URI::_server'; use URI::Escape qw(uri_unescape); sub default_port { 1965 } sub _gemini_type { my $self = shift; my $path = $self->path_query; $path =~ s,^/,,; my $gtype = $1 if $path =~ s/^(.)//s; if (@_) { my $new_type = shift; if (defined($new_type)) { Carp::croak("Bad gemini type '$new_type'") unless length($new_type) == 1; substr($path, 0, 0) = $new_type; $self->path_query($path); } else { Carp::croak("Can't delete gemini type when selector is present +") if length($path); $self->path_query(undef); } } return $gtype; } sub gemini_type { my $self = shift; my $gtype = $self->_gemini_type(@_); $gtype = "1" unless defined $gtype; $gtype; } sub gtype { goto &gemini_type } # URI::URL compatibility sub selector { shift->_gfield(0, @_) } sub search { shift->_gfield(1, @_) } sub string { shift->_gfield(2, @_) } sub _gfield { my $self = shift; my $fno = shift; my $path = $self->path_query; $path =~ s/\?/\t/; $path = uri_unescape($path); $path =~ s,^/,,; my $gtype = $1 if $path =~ s,^(.),,s; my @path = split(/\t/, $path, 3); if (@_) { # modify my $new = shift; $path[$fno] = $new; pop(@path) while @path && !defined($path[-1]); for (@path) { $_="" unless defined } $path = $gtype; $path = "1" unless defined $path; $path .= join("\t", @path); $self->path_query($path); } $path[$fno]; } 1;

For that to be considered for addition, can I just throw it over the fence here?

Replies are listed 'Best First'.
Re: Adding recognition of Gemini to URI.pm?
by swl (Prior) on Oct 11, 2024 at 04:23 UTC

    For that to be considered for addition, can I just throw it over the fence here?

    The metadata for URI::Heuristic lists it as maintained at https://github.com/libwww-perl/URI, so you could work up a PR there. The contribution guidelines are on metacpan at https://metacpan.org/release/OALDERS/URI-5.30/contribute

        > there is no contact information listed there nor a way to submit the changes.

        Right from the bottom of the page:

        Contributions are preferred in the form of a Github pull request. See Using pull requests for further information. You can use the Github issue tracker to report issues without an accompanying patch.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Technobabble (was: Re: Adding recognition of Gemini to URI.pm?)
by Bod (Parson) on Oct 11, 2024 at 15:38 UTC

    This quote from the Gemini webpage is certainly a contender for the most meaningless piece of technobabble I've come across this year:

    That's not a new idea, but it's not old fashioned either. It's timeless, and deserves tools which treat it as a first class concept, not a vestigial corner case.

      The project's entire website is like that. It feels like an AI responding to a series of prompts or an expression of someone's undiagnosed or unmanaged autism.

      I tried to find a page that would show what the app layer protocol was like and clicked around until I found their 'Protocol design' (https://geminiprotocol.net/docs/faq-section-4.gmi) page. Way down in section 4.1.2, it finally gets to a somewhat less vague description:

      The Gemini network protocol looks kind of like something between HTTP 0.9 and HTTP 1.0. There's only one kind of request, analogous to GET, and the request itself is nothing but a URL. It's sort of like a HTTP request where the only header allowed is Host.

      More than 8000 words on that page alone and that's as close as they get to an actual drilled down description of what 'gemini://' is. No actual examples of what it would look like if someone implemented a client. No clear specification or explanation as to why it exists as an entirely new protocol fully incompatible with current browsers.

      The more I looked, the more weird stuff I found. Their page for 'Gemini-adjacent technologies and cultures' (https://geminiprotocol.net/docs/faq-section-6.gmi) links to a few other websites. The first link is to a so-called Transjovian Council (https://transjovian.org/view/index). Wtf is that? They explain it this way:

      This is “The Transjovian Council”, a group of people living in the outer reaches of our system, beyond Jupiter. Out here, the light of the sun is dim and we must make due with what we have, watched over by the stern gods of the soil, of the sky, of the sea, of the underworld: Saturn, Uranus, Neptune, Pluto… Here we are, in our ice mines, gene labs, in our generation habitats, and all we have is text over low bandwidth connections, with long delays. And yet! And yet, a council has formed: we take council with each other. We deliberate. As Thucydides had Pericles say in his funeral oration, thousands of years ago: “instead of looking on discussion as a stumbling-block in the way of action, we think it an indispensable preliminary to any wise action at all.”

      Is that a cult or a game? You get two guesses.

      I decided not to click their second link and went to YouTube. All I could find were a pair of videos simply saying that it deserves broad adoption because it forces privacy upon you. It lacks the ability to post data, store cookies or cache anything, and that somehow makes it private. To borrow a phrase, it insists upon itself. I hope it never makes it into URI.

      > (interconnected text documents). That's not a new idea, but it's not old fashioned either. It's timeless, and deserves tools which treat it as a first class concept

      This needs context:

      I remember people refusing to use the "new" Netscape (or even Mosaic) back then and sticking with a pure text browser like Lynx, w3m or Emacs for years...

      It resonates to me now that I forbid cookies and JavaScript by default in http. This involves going thru extra complications to add exceptions for certain sites like Perlmonks.

      That's of course not a 100% solution, because some sites simply won't work without me having to accept wastes of bandwidth, lag, stolen performance and attacks on privacy.¹

      Gemini - without having read all specifications for the protocol -sounds like a way to me to guarantee by restrictions that all sites comply to textual browsability.

      Tho I'm sceptical...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      Updates

      1) For example: I'm specifically avoiding MSN News because they automatically play videos, and my android browsers have no options to forbid this.

        sounds like a way to me to guarantee by restrictions that all sites comply to textual browsability.

        True, but that could be better accomplished using a restrictive web client. There's no need to reinvent a whole networking architecture to refuse non-text documents. Then, instead of an unsupported Gemini protocol, you have a network of Gemini web sites that anyone can visit. And those that want to enforce a stay within the network can do so using a web browser plugin.

      The site about the Gemini protocol seems to have gotten worse, more loquacious, in its move away from gemini.circumlunar.space to its new location. However, the relevant specifications are quite simple:

        Gotta wonder about their (or my) technical competency.

        Response headers MUST be UTF-8 encoded text and MUST NOT begin with the Byte Order Mark U+FEFF.

        Considering U+FEFF is the UTF-16 BOM, not the UTF-8 BOM, I don't know why it ever would. lol

        Update: Well, wee-pee says "If the Unicode byte-order mark U+FEFF is at the start of a UTF-8 file, the first three bytes will be 0xEF, 0xBB, 0xBF" ... which doesn't quite make sense to me. "If it starts with A, it starts with B." Hmm...