Re: Unicode infinity
by hv (Prior) on Jul 01, 2024 at 18:34 UTC
|
How likely is a patch for this to get accepted?
In June 2021 Nick Clark proposed an RFC process to answer that very question. The proposal was adopted, and the documents have since been renamed to PPCs ("Perl proposed changes"). A description of the process is on github, and in that repository you can see existing PPCs along with any discussion and resolution.
My initial reaction is that this would be better as a module than as a core feature, certainly for initial acceptance and usability testing.
| [reply] |
|
|
| [reply] |
Re: Unicode infinity
by ikegami (Patriarch) on Jul 01, 2024 at 14:52 UTC
|
$ perl -Mv5.40 -e'my $x= Inf; say $x'
Bareword "Inf" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
So we're talking about barewords, and barewords can only start with a subset of word characters. "∞" isn't a word character.
Does anyone else feel like this ought to get fixed? How likely is a patch for this to get accepted?
What fix are we talking about?
Changing the definition of bareword to include some symbols? Absolutely not.
Changing the type of character ∞ is in Perl to be different than in Unicode? Absolutely not.
Changing the definition of type of character ∞ is in Unicode? Absolutely not.
Perhaps you meant to compare to builtin::inf. It was added in 5.40, and it's still experimental, but I expect it to be imported by use v5.xx in the future. Adding ∞ as an alias for that is a completely different story. That would make more sense, and is entirely feasible.
| [reply] [d/l] [select] |
|
|
Yeah, I commented above that I had forgotten strict and mistakenly thought Inf was parsed as a number.
But, this is my point - right now the character is illegal in a perl script, and there's no reason (that I can guess, having not looked at perl's internals yet) that the language parser couldn't use that as a permitted character for numeric literals. It would be a little different from 'inf' in that it would be parsed as the constant rather than a lexical function that returns a constant, so like "-∞" would immediately become an NV, where "-inf" is a negation of a function call that hopefully gets resolved to an NV during compiler optimizations.
Edit
I mean exactly this :-)
$ git diff --cached
diff --git a/toke.c b/toke.c
index e6ff0c4f74..4590774e44 100644
--- a/toke.c
+++ b/toke.c
@@ -9174,6 +9174,13 @@ yyl_try(pTHX_ char *s)
return tok;
goto retry_bufptr;
}
+ if (UTF && s + 2 < PL_bufend && *s == '\xE2' && s[1] == '\x88'
+ && s[2] == '\x9E') {
+ pl_yylval.opval = newSVOP(OP_CONST, 0, newSVnv(NV_INF));
+ s += 3;
+ if (PL_expect == XOPERATOR)
+ no_op("Number",s);
+ TERM(THING);
+ }
yyl_croak_unrecognised(aTHX_ s);
case 4:
$ PERLLIB=lib ./perl -E 'use utf8; say∞;'
Inf
$ PERLLIB=lib ./perl -E 'use utf8; say -∞;'
-Inf
| [reply] [d/l] [select] |
|
|
$ perl -MO=Concise,-exec -Mv5.40 -Mbuiltin=inf -e'my $x = inf;'
Built-in function 'builtin::inf' is experimental at -e line 1.
1 <0> enter v
2 <;> nextstate(main 9 -e:1) v:us,*,&,{,$,fea=8
3 <$> const[NV Inf] s
4 <1> padsv_store[$x:9,10] vKS/LVINTRO
5 <@> leave[1 ref] vKP/REFC
-e syntax OK
| [reply] [d/l] [select] |
|
|
|
|
|
Re: Unicode infinity
by haj (Vicar) on Jun 30, 2024 at 10:35 UTC
|
Well, feel free to try. I have used Inf rarely and I don't feel any urgency nor importance here. | [reply] [d/l] |
|
|
My new use case is for those APIs where you need to specify how many of something to return, and you want a symbolic constant for "all of them". Doubly useful when that symbolic constant always compares greater than any integer list length, and would be pretty awesome if it didn't need imported into the current scope.
| [reply] |
|
|
This is similar to where I use 'Inf'. I understand your point. The wording ought to get fixed seems a bit strong, though.
A patch might be accepted, I have no idea. Nerdy things are possible under use utf8; as of today. One can write valid code which PerlMonks won't even display correctly (I used a GitHub Page). Paul Evans is experimenting with Unicode characters as infix operators (e.g. Syntax::Operator::Identical), since according to his musings Perl is running short on unused ASCII characters for that purpose. Your proposal fits into the same category.
| [reply] |
Re: Unicode infinity
by sectokia (Friar) on Jul 01, 2024 at 04:26 UTC
|
If a string contains only letters, digits, and underscores without starting with a digit, you can omit quotes.
The string 'Inf' complies with that, so you can write my $x= Inf; instead of my $x= "Inf";
A string containing only a character with value of 0x221e does not meet the requirement, so you need to quote it.
There is a near zero chance of this being 'fixed' because perl code is not UTF-8 and its base rules around strings without quotes is not likely to ever include any character outside the base printable ASCII range of values 0x20 to 0x7E.
| [reply] [d/l] [select] |
|
|
If you declare 'use utf8' then yes the perl script is UTF-8.
I don't want it to get parsed as a string, I want ∞ to be an official part of the numeric tokenizer that returns an NV float to perl internals. When printed, it would render as "Inf" because it's the actual floating-point infinity value.
(but actually you pointed out a misconception I had. I was thinking the token Inf was parsed as a float by the parser. It was because I didn't bother to enable strict and warnings)
| [reply] |
|
|
'use utf8' just assumes utf8 encoding to convert input bytes to character values. It does not change perls syntax - and so does not change the parsing of literals, nor does it change what values are accepted as strings without quoting.
So basically what you are asking for is that perl treat a specific character value (0x221e) as a numerical infinity (in all cases, or only in literals, or only in non-quoted literals, or only in a non-quated literals of one character in length?).
Note that the above has nothing to do with 'utf8', and would obviously break any code anytime a character has a value of 0x221e.
I guess what I'm saying is: 'Inf' is a string that can be treated as special case in numerical context. An alternative using other unicode characters would still need to be string of more than 1 character, it can't just be a one character because every individual character is already mapped to a numerical number.
Why does it make more sense for the inf unicode symbol to be treated as numerical infinity instead of... its unicode value? And if we do, where do we stop? How many other symbols should be treated special values instead of their unicode values? Should we treat 0x03C0 as 3.14159... ? At what point is your request really just 'perl should accept unicode symbologies as syntax'?
(these are genuine questions - I find this very interesting, hopefully this is not coming across wrong :-)
| [reply] |
|
|
|
|
|
|
|
If a string contains only letters, digits, and underscores without starting with a digit, you can omit quotes.
Not if you use strict; (as one always should) or any version declaration use 5.012 or newer which implies strict.
| [reply] |
Re: Unicode infinity
by Danny (Chaplain) on Jul 01, 2024 at 13:18 UTC
|
What's wrong with just doing something like:
perl -wE 'use strict; my %h = ("∞" => "Inf", "Inf" => "Inf"); my $x= $h{"∞"}; say $x / 5'
| [reply] |
|
|
That kind of defeats the convenience, no? Even for older perls, that's more to type than sub inf { 0+"Inf" } say inf / 5;
| [reply] [d/l] |