in reply to Re: combining lists along with a regex
in thread combining lists along with a regex

That's got it Athanasius, thank you. Once you explained what I was doing and showed the folly of capturing what I didn't need, it all came together.

0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

The regexes are much tidier now. I still think I need to strip off whitespace on the RHS. I suppose I could try to roll it all into one if I get ambitious, but I think it adds legibility to make it a different step:

for (@name) { s/\s+$//; my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1\. /; say "int is $int"; }
for (@harm) { s/\s+$//; my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1\./; say "schmint is $int"; }

This will do nicely for now, but I'm open to any other opinions.

Replies are listed 'Best First'.
Re^3: combining lists along with a regex
by Athanasius (Archbishop) on May 09, 2015 at 08:19 UTC
    I think it adds legibility to make it a different step

    I totally agree, in the general case. In this specific case, however, the first substitution — explicitly stripping off trailing whitespace — is in fact not needed at all, because the second substitution does that anyway:

    18:14 >perl -wE "my $s = '2. Kim Washington '; $s =~ s/^\d+\.\s+( +\w+\s+\w).*$/$1. /; say qq['$s'];" 'Kim W. ' 18:14 >

    Note also that in a substitution, only the left-hand part is a regex; the right-hand part is just an (interpolated) string, so the . doesn’t need to be escaped.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Hmmm. Not meaning to quibble, but I notice your inputs don't have the newlines that mine do, which I don't want in the output, as I want to provide my own as opposed to reproducing whatever I had in the input, which may or may not be a windows-appropriate list, as that is where these get printed off ultimately. I've read (in perlmonks) that s/\s+$//; is the best way to strip newlines and extra spacing on RHS. Without them I get this:

      0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

      Now that I look at this, it isn't at all clear to me how I would accomplish that in one step with the greediness of the .*. Indeed, I don't understand why it doesn't match .* and get discarded without the s/\s+$//;:

      for (@name) { s/\s+$//; my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1. /; say "int is $int"; } for (@harm) { s/\s+$//; my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1./; say "schmint is $int"; }
      0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

      The point about the period not needing to be escaped in the right hand part of the substitution is well-taken and reflected in the above. Thanks again.

        You’re right, I didn’t take newlines into account. You could always chomp the line before the substitution. However, to do it all in one step, you need to add an /s modifier to the (regex part of the) substitution:

        12:51 >perl -wE "my $s = qq[Hello world. \n]; $s =~ s/^.*(world).*$/m +onks/; say qq[|$s|];" |monks | 12:51 >perl -wE "my $s = qq[Hello world. \n]; $s =~ s/^.*(world).*$/m +onks/s; say qq[|$s|];" |monks| 12:51 >

        See perlre#Modifiers.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,