RFC: The lightning-rod operator

"It's not as if it's a particularly nice house," [Mr. Prosser] said.
"I'm sorry, but I happen to like it."
"You'll like the bypass."

— Douglas Adams, The Hitch-Hikers Guide to the Galaxy

The Stage

Extending the Perl 5 language is not to be taken lightly. After more than two decades of stable releases, most basic design decisions have been settled and there is quite some evidence of the language being useful as it is.

Still, if a new feature promises benefit, is documented well, can't easily be covered by a module, is implemented by someone, does not break backwards compatibility, nor the stability or efficiency of the compiler, and comes with an agreeable syntax, Perl 5 Porters might be inclined to accept it. This can happen. Occasionally.

I'd like to find out what other Perl monks think about this new operator I have been fantasizing about.

The Need

There has already been some discussion about safe dereferencing. We have even had a poll on what an operator for that should look like. Safe dereferencing means dereferencing in a way that neither auto-vivificates undefined values nor chokes over attempts to call methods of undefined objects. Safe dereferencing is a way to carry undefinedness through certain expressions with less clutter than using explicit conditionals.

Examples:

  $name = $user[$idx]  -> {'name'};     # may create a hash
  $name = $user[$idx] ?-> {'name'};     # leaves @user alone
  $name = $person  -> name;             # chokes on undef
  $name = $person ?-> name;             # passes undef on
[download]

Note that the question-marked arrow here stands in for whatever symbol would have been chosen. I don't care.

The Generalization

I can see a more general concept behind the cases the safe dereferencing operator was meant to address. What if we had a way to safely bail out of all sorts of expressions upon a check for undefinedness? The situation where we want to perform safe dereferencing would be one possible occasion for this. Others could be arithmetic expressions or string manipulations, you name it. And bailing out here does not only mean carrying on with a value of undef, but actually cutting useless branches short, like boolean operators do.

The Details

I will call our new operator lightning-rod for reasons I'll elaborate on in a moment. It is a unary postfix operator like the postfix variant of the auto-increment operator ++, which also shares its precedence level and some aspects of its visual appearance.

I am suggesting the symbol ^^ for this operator.

Its high precedence makes sure it attaches to a single term in a numerical, string or binary expression. As a unary operator it does not grab other terms to use as arguments. Being well-behaved if somewhat boring, it does not change its argument. It just checks whether the argument is defined.

If so, the expression is evaluated like if the lightning rod wasn't there. If not, lightning strikes, hits the rod, and is safely grounded, where ground level means the precedence level of || and //. The effect is that the whole expression short-cuts to undef, unless it is an or-chain or some combination of parts of even lower precedence, in which case just the subexpression short-cuts.

Examples:

 
# If velocity is defined, use it, otherwise don't.
# Note that just the concatenation arg is guarded,
# so you still get an error if $b is not a hashref.

$a = $b->{'velocity'}^^ . ' mph' // 'unknown';

# Safe dereferencing of $d (it may be undefined).
# Note that the second dereference is normal,
# so parent() returning undef would trigger an error.

$c = $d^^ ->  parent  ->  name;

# Safe dereferencing of $d (it may be undefined).
# Note that the second dereference is normal,
# so a nonexistent 'parent' component would be created.

$c = $d^^ ->{'parent'}->{'name'};
[download]

Notes

It is important to grasp the precedence level of short-cuts introduced by the lightning-rod operator. In particular, short-cuts won't escape out of parentheses or across commas. So, e.g., you can't cut a function call from within the argument list, say.

# these two are equivalent:
$a = foobar($b^^);
$a = foobar($b);

# this would be safer:
$a = defined($b)? foobar($b): undef;
[download]

If you do want to get out of some nested structure, you might have to wire more than one short-cut:

$f = ($g | $h^^ | $i)^^ & $j;
[download]

I am aware that the other postfix operators, ++ and --, all have prefix variants, too. While it would be conceivable to make prefix ^^ legal and act like postfix ^^, it does not strike me as equally intuitive. Therefore I lean on sticking just with postfix. But I don't have a strong opinion either way.

A lightning-rod at the very end of an expression is more than a no-op. As it short-cuts out to the boolean-or level it may well guard against medium preference operations on undef such as numerical or string operations.

# these two are equivalent:
$a =              $b . $c^^;
$a = defined($c)? $b . $c: undef;
[download]

The defined-and operator has also been suggested as a definedness-aware operator in the past. Defined-and can more easily be emulated using existing operators than either the lightning-rod or defined-or.

# these two are equivalent:
$a =         foo() ?? bar();
$a = defined(foo())?  bar(): undef;
[download]

Note that the double question mark here stands in for whatever symbol would have been chosen for defined-and.

Inside quoted strings, carets as well as lightning-rods should not have new special meanings. To guard against undef being expanded into a string, take it outside the quotes and stick a lightning-rod next to it:

$a = "hello $b";       # may trigger a warning
$a = "hello $b^^";     # does not help
$a = 'hello ' . $b^^;  # may evaluate to undef
[download]

While auto-increment and auto-decrement can only be applied to modifyable values (L-values), lightning-rod can adorn anything. Applied to an L-value, however, the result is no longer modifyable:

$a = $b^^++;        # error
$a = $b++^^ * $c;   # ok
$a = ++$b^^ * $c;   # ok but pointless
[download]

At least this is the behaviour I figure to be easiest to implement. I wouldn't mind making the first case legal, though. It would increment $b only if it was defined.

Maybe someone was hoping the double caret would some day be used for a medium-level precedence version of xor. Pardon me. I think that would be just as strange to C or bash programmers while much less useful to us.

In fact, since lightning-rod is a postfix operator, it would not completely rule out another infix operator using the same symbol, although I wouldn't want to consider that. Look at it this way: Being a postfix operator, it will not easily be mistaken for the other thing we don't want to talk about.

If you like the operator but not how I named it, I could also settle on "bat operator". A bat hangs somewhere beneath the ceiling and can grab something that is not too heavy for it to lift, and put that down elsewhere. Should we talk about a wumpus operator next? ☺

A more conservative name for ^^ might be "definedness guard".

Staying in the metaphor of electrical safety, the name "fuse operator" also has a nice ring to it. It would make a conveniently short token name, too.

The Conclusion

The defined-or operator // introduced in Perl v5.10 gave us a very useful option for short-cutting on definedness. It makes Perl expressions a tad more expressive. Expanding on this notion, I am now presenting a candidate for a fine companion. It would be more than a replacement for the safe dereferencing operator that seems to be so hard to agree upon.

Of course it will take some getting used to. And a proof that it can be implemented efficiently has yet to be given. I might look into that if there is sufficiently encouraging feedback while nobody else volunteers.

Update 1: typo (thanks, MidLifeXis)
Update 2: another name: fuse

Comment on RFC: The lightning-rod operator Select or Download Code

Replies are listed 'Best First'.
Re: RFC: The lightning-rod operator by LanX (Saint) on Jan 23, 2016 at 16:55 UTC
I have several problems with the "The Need" paragraph already ... 1) you are proposing two "safe" operators in one, the first to avoid magic (autovivification at hash access) the second to introduce magic (ignore error of missing method). I don't find this consistent. 2) The number of edge cases for the large amount of operators makes Perl hard to memorize, we are not only running out of keyboard characters, but learning Perl becomes increasingly complicated. 3) DWIM shouldn't violate orthogonality, i.e. the ability to understand the syntax logically by composing smaller concepts to a consistent whole. If P5P wants to extent syntax, than a better support for autoboxing would be the way I'd favour for efficiency and flexibility! `use strict; use warnings; use Data::Dump qw/dd/; my $get = sub { my ($ref ,@path)=@_; for my $member (@path){ return undef unless exists $ref->{$member}; $ref = $ref->{$member}; } return $ref; }; my %hash = ( a=> {b=> {c=> 666} }); dd \%hash; dd $hash{a}->$get(qw/b c/); dd $hash{a}->$get(qw/x x x/); # no autovivification #dd $hash{a}{x}{x}{x}; # autovivification dd \%hash;` [download] -> `{ a => { b => { c => 666 } } } 666 undef { a => { b => { c => 666 } } }` [download] Please note that: a) the name of the method is (or can be) self explanatory ( `get` is just a guess) b) this approach is backward compatible, the Perl implementation might be slow but will always work as a fall back c) performance could be easily improved by XS code in an intermediate step d) the solution is generic, i.e. any other "missing" operator could be implemented/experimented with and hence orthogonal e) any new syntax like `~>` or whatever could be additionally implemented with a clearly named twin autobox method Sorry if I didn't read your hole post and didn't supply an example implementation for `$call` but a flue with fever limits my attention span ATM ;-) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!} PS: As a side note, I was first thinking of realizing something like a $safe-method to handle both cases `$name = $user[$idx] -> {'name'}; # may create a hash $name = $user[$idx] -> $safe -> {'name'}; # leaves @user alone $name = $person -> name; # chokes on undef $name = $person -> $safe -> name; # passes undef on` [download] But this would not only imply too much slow magic like a proxy class but also mangle two different concepts in one operator (see point 1)	[reply] [d/l] [select]
Re^2: RFC: The lightning-rod operator by martin (Friar) on Jan 23, 2016 at 23:54 UTC
Thank you for your comments so far, and thank you for pointing out another technique to avoid autovivification (other than the autovivification pragma on CPAN). I suppose some of your points will get less of an issue if you do get around to read the rest of the article. In particular, you may rest assured that I don't want to suggest new dereferencing operators. To the contrary, the lightning-rod or fuse operator I was talking about would make such operators unnecessary.	[reply]
Re^3: RFC: The lightning-rod operator by LanX (Saint) on Jan 24, 2016 at 21:43 UTC
Sorry but your ideas are not trivial or easy to grasp, so I have to go little by little. =) I find the desired short cut behaviour to catch `undef` even more problematic than the rest. I'd rather prefer a `catch_undef { BLOCK }` command, because the block would be explicit about what is caught without much explanation. This can be emulated (at least) with `my $h_b = {}; my $x = eval { use warnings FATAL => 'uninitialized' ; $h_b->{velocity}." mph"; } // 'unknown'; print $x;` [download] I tried to construct some syntactic sugar `sub catch_undef (&) { my $code_ref = shift; eval { use warnings FATAL => 'uninitialized' ; $code_ref->() }; }` [download] but I'm running into two problems: 1) Obviously pragmas are lexically scoped, I seem to remember there are some obscure tricks to manipulate the warning flags of a coderef (something with $^H ?) but I'm too lazy at the moment. 2) seems to be a bug in the parser, because I get a weird syntax error for `catch_undef { $ref->{velocity} .'mph' } // "unknown";` `Too many arguments for main::catch_undef at /tmp/tst.pl line 29, near "// "unknown""` using `\|\|` instead solves the parsing problem (but not the task) Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]
Re^4: RFC: The lightning-rod operator by martin (Friar) on Jan 30, 2016 at 17:09 UTC
Re: RFC: The lightning-rod operator (safe deref) by Anonymous Monk on Jan 22, 2016 at 23:41 UTC
What operator should perl5porters use for safe dereferencing?	[reply]
Re: RFC: The lightning-rod operator by martin (Friar) on Feb 07, 2016 at 10:31 UTC
Follow-Up: I have reworked the article an posted it on blogs.perl.org. What has changed is that I have settled on the name fuse operator and that `$x^^++` should be legal. Thanks again for everybody's feedback.	[reply]


We don't bite newbies here... much
	PerlMonks