Hello jo37,
Do you still think it's a bug even though we seem to be able to get what we want with some Perl trickery? You duplicated the issue choroba raised but we did get the answer he expected. // has been around at least since 5.6 as it is discussed in PP 3rd, as well as 4th (5.14) without, unfortunately, examples. The one in perlop is clear but is not suggestive, to me, of a real use case. I'm still looking for a definitive use case, or at least realistic, if not definitive. I've come up with two. The first might be used by a grammarian or linguist researching comparative languages. The second extracts the string between html tags, although I show how to do this with a much simpler plain-old regex. Me thinks it's a stretch to use // when there are other ways to do a thing, but TMTOWTDI. One of my examples parses a string while the other uses an array. if (/this/../that/) {... almost demands an array. I would really like to hear a war story or two how // was used to solve some really gnarly problem. Here be my two examples:
#!/usr/bin/env -S perl -w
##!/usr/bin/env -S perl -wd
use v5.30.0;
use strict;
use List::AllUtils qw( reduce );
my ($slurpee, $length, $sum);
{
local $/;
($slurpee) = <DATA>;
}
$length = length $slurpee;
my @regexes = (
[ qr/[A-Z]/, "uppercase characte
+rs", 0 ],
[ qr/[a-z]/, "lowercase characte
+rs", 0 ],
[ qr/\d/, "digits",
+ 0 ],
[ qr/\s/, "whitespace charact
+ers", 0 ],
#
# Note: $ must be \$, and - must be first to avoid range interpretat
+ion.
#
[ qr/[-~`!@#\$%^&*()_+={}\[\]|\\:;"'<>,.?\/]/, "punctuation charac
+ters", 0 ],
);
#for my $c (split //, $slurpee) { print $c; }
for my $case (@regexes) {
say "seeding // with: $case->[0]";
"Aa5: " =~ $case->[0]; # seed the // iteration
say "matched: '$&'" if $&;
for (split //, $slurpee) {
// and $case->[2]++;
}
}
for my $case (@regexes) { printf("%4d %s\n", $case->[2], $case->[1]);
+}
$sum = reduce { $a + $b } (map $_->[2], @regexes);
printf(" sum and length: %3d and %3d\n", $sum, $length);
say "\nNow extract the string between HTML tags with //...";
my $str = "Before tag<i>between tags</i>after tag";
say "\n$str";
$str =~ s{ (?: (?<= \w) (?= <) | (?<= >) (?= \w) ) }{ }xg; # insert
+ whitespace
say $str;
my @tokens = split / /, $str;
say "Tokens...\n";
for (@tokens) { say };
my $between;
for (@tokens) {
if (/<\w>/../<\/\w>/) {
$between .= "$_ " unless // and $&;
}
}
chop $between if $between;
say "'$between'";
$str = "\n'Before tag<i>between tags</i>after tag'";
say $str;
say "Parse it again with...";
my $regex = qr/ (<\w+>) (.*) (<\/\w+>) /x;
say $regex;
$str =~ $regex;
say "\$1: '$1'";
say "\$2: '$2'";
say "\$3: '$3'";
exit(0);
__END__
Last night I dreamt I went to Manderley again. This will come as a sur
+prise to
Daphne since she did not write these lines. Here is a line containing
+ stuff
,?- ! : that should/must be deleted/// ; : ! before using it as a o
+ne-time-pad.
A one-time-pad should contain only characters, no punctuation, no par
+entheticals like (this is bogus) or [(this is bogus, too)], or {also
+this}; no contractions, such as
I'll or it's or digits such as 0, 123, -75 or 8 P.M., and no numbers,
+such as $1,234.69. If
you want to use numbers in your message, spell them out; one-hundred d
+ollars and sixty-nine cents, or theeepm. These non-alpha characters
+in the one-time-pad will be discarded, but they must be entered eactl
+y as represented in the book used as the pad. Let the encoding progr
+am decide what to use and what to skip.
Some of the text is from "Rebecca", an out-of copyright but not out-of
+-print fictional
work that can be freely downloaded as an eBook from Project Gutenberg.
+ I use it as the
raw source for one-time pads in a cryptologic research study; i.e., ex
+tract potential
pad bits from somewhere in the text, randomly chosen with seek from EO
+F. Munge the
characters, encrypt the message and delete the characters used for the
+ pad. Since both
encoder and decoder use the same seek expression, both pads are guaran
+teed to be
identical, and since the characters used to create the pad are deleted
+, never to be seen
again, the pad is guaranteed to be used exactly once. Does not scale f
+or large
organizations but works flawlessly for a small group of conspirators.
O U T P U T
seeding // with: (?^u:A-Z)
matched: 'A'
seeding // with: (?^u:a-z)
matched: 'a'
seeding // with: (?^u:\d)
matched: '5'
seeding // with: (?^u:\s)
matched: ' '
seeding // with: (?^u:[-~`!@#\$%^&*()_+={}\\|\\:;"'<>,.?/])
matched: ':'
26 uppercase characters
1168 lowercase characters
13 digits
283 whitespace characters
80 punctuation characters
sum and length: 1570 and 1570
Now extract the string between HTML tags with //...
Before tag<i>between tags</i>after tag
Before tag <i> between tags </i> after tag
Tokens...
Before
tag
<i>
between
tags
</i>
after
tag
'between tags'
'Before tag<i>between tags</i>after tag'
Parse it again with...
(?^ux: (<\w+>) (.*) (</\w+>) )
$1: '<i>'
$2: 'between tags'
$3: '</i>'
|