comment on

Hello jo37,

Do you still think it's a bug even though we seem to be able to get what we want with some Perl trickery? You duplicated the issue choroba raised but we did get the answer he expected. // has been around at least since 5.6 as it is discussed in PP 3rd, as well as 4th (5.14) without, unfortunately, examples. The one in perlop is clear but is not suggestive, to me, of a real use case. I'm still looking for a definitive use case, or at least realistic, if not definitive. I've come up with two. The first might be used by a grammarian or linguist researching comparative languages. The second extracts the string between html tags, although I show how to do this with a much simpler plain-old regex. Me thinks it's a stretch to use // when there are other ways to do a thing, but TMTOWTDI. One of my examples parses a string while the other uses an array. if (/this/../that/) {... almost demands an array. I would really like to hear a war story or two how // was used to solve some really gnarly problem. Here be my two examples:

#!/usr/bin/env -S perl -w
##!/usr/bin/env -S perl -wd

use v5.30.0;
use strict;
use List::AllUtils qw( reduce );

my ($slurpee, $length, $sum);
{
    local $/;
    ($slurpee) = <DATA>;
}
$length = length $slurpee;

my @regexes = (
    [ qr/[A-Z]/,                                   "uppercase characte
+rs",   0 ],
    [ qr/[a-z]/,                                   "lowercase characte
+rs",   0 ],
    [ qr/\d/,                                      "digits",          
+       0 ],
    [ qr/\s/,                                      "whitespace charact
+ers",  0 ],
#
#   Note: $ must be \$, and - must be first to avoid range interpretat
+ion.
#
    [ qr/[-~`!@#\$%^&*()_+={}\[\]|\\:;"'<>,.?\/]/, "punctuation charac
+ters", 0 ],
);

#for my $c (split //, $slurpee) { print $c; }

for my $case (@regexes) {
    say "seeding // with: $case->[0]";
    "Aa5: " =~ $case->[0];       # seed the // iteration
    say "matched: '$&'" if $&;
    for (split //, $slurpee) {
        // and $case->[2]++;
    }
}    
for my $case (@regexes) { printf("%4d %s\n", $case->[2], $case->[1]); 
+}

$sum = reduce { $a + $b } (map $_->[2], @regexes);
printf(" sum and length: %3d and %3d\n", $sum, $length);

say "\nNow extract the string between HTML tags with //...";
my $str = "Before tag<i>between tags</i>after tag";
say "\n$str";
$str =~ s{ (?: (?<= \w) (?= <) | (?<= >) (?= \w) ) }{ }xg;    # insert
+ whitespace
say $str;
my @tokens = split / /, $str;
say "Tokens...\n";
for (@tokens) { say };

my $between;
for (@tokens) {
    if (/<\w>/../<\/\w>/) {
        $between .= "$_ " unless // and $&;
    }
}
chop $between if $between;
say "'$between'";

$str = "\n'Before tag<i>between tags</i>after tag'";
say $str;
say "Parse it again with...";
my $regex = qr/ (<\w+>) (.*) (<\/\w+>) /x;
say $regex;
$str =~ $regex;
say "\$1: '$1'";
say "\$2: '$2'";
say "\$3: '$3'";

exit(0);
__END__
Last night I dreamt I went to Manderley again. This will come as a sur
+prise to
Daphne since she did not write these lines.  Here is a line containing
+ stuff
   ,?- ! : that should/must be deleted/// ; : ! before using it as a o
+ne-time-pad.
A one-time-pad should contain only characters, no  punctuation, no par
+entheticals like (this is bogus) or [(this is bogus, too)], or {also 
+this}; no contractions, such as
I'll or it's or digits such as 0, 123, -75 or 8 P.M., and no numbers, 
+such as $1,234.69.  If
you want to use numbers in your message, spell them out; one-hundred d
+ollars and sixty-nine cents, or theeepm.  These non-alpha characters 
+in the one-time-pad will be discarded, but they must be entered eactl
+y as represented in the book used as the pad.  Let the encoding progr
+am decide what to use and what to skip.

Some of the text is from "Rebecca", an out-of copyright but not out-of
+-print fictional
work that can be freely downloaded as an eBook from Project Gutenberg.
+ I use it as the
raw source for one-time pads in a cryptologic research study; i.e., ex
+tract potential
pad bits from somewhere in the text, randomly chosen with seek from EO
+F. Munge the
characters, encrypt the message and delete the characters used for the
+ pad. Since both
encoder and decoder use the same seek expression, both pads are guaran
+teed to be
identical, and since the characters used to create the pad are deleted
+, never to be seen
again, the pad is guaranteed to be used exactly once. Does not scale f
+or large
organizations but works flawlessly for a small group of conspirators.
[download]

O U T P U T

  seeding // with: (?^u:A-Z)
  matched: 'A'
  seeding // with: (?^u:a-z)
  matched: 'a'
  seeding // with: (?^u:\d)
  matched: '5'
  seeding // with: (?^u:\s)
  matched: ' '
  seeding // with: (?^u:[-~`!@#\$%^&*()_+={}\\|\\:;"'<>,.?/])
  matched: ':'
    26 uppercase characters
  1168 lowercase characters
    13 digits
   283 whitespace characters
    80 punctuation characters
   sum and length: 1570 and 1570

  Now extract the string between HTML tags with //...

  Before tag<i>between tags</i>after tag
  Before tag <i> between tags </i> after tag

  Tokens...

  Before
  tag
  <i>
  between
  tags
  </i>
  after
  tag
  'between tags'

  'Before tag<i>between tags</i>after tag'
  Parse it again with...
  (?^ux: (<\w+>) (.*) (</\w+>) )
  $1: '<i>'
  $2: 'between tags'
  $3: '</i>'

In reply to Re^2: Empty pattern in regex [updated] by perlboy_emeritus
in thread Empty pattern in regex by choroba

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.