in reply to combining lists along with a regex

Hello Datz_cozee75,

I'd like the second word of the name to be rendered as only a first initial. I'm completely mystified by the omission of the second letter.

For the input line 2. Kim Washington, do you want the output to be Kim W or K Washington? You say “the second word of the name,” so I’m assuming the former. Here is your regex, with the captures numbered:

my $int = s/^(\d+\.)(\s+)(\w+)(\s+)(\w)(.)/$3$4$5/; # 1 2 3 4 5 6

For the given input line, captures are as follows:

^(2.)( )(Kim)( )(W)(a)shington 1 2 3 4 5 6

The substitution says: match the expression in the left-hand side (regex), then replace the matched part with the right-hand side. The former (i.e., the match) is 2. Kim Wa. The latter (i.e., the replacement) is $3$4$5, which expands to Kim W. This is replaces the matched text within the string, so 2. Kim Wa becomes Kim W and the rest of the string is unaffected. And that’s why the second letter disappears!

For this substitution, I would use a simpler regex (only one capture), like this:

s/^\d+\.\s+(\w+\s+\w).*$/$1/;

For example:

13:08 >perl -wE "my $s = '2. Kim Washington'; $s =~ s/^\d+\.\s+(\w+\s+ +\w).*$/$1/; say $s;" Kim W 13:09 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: combining lists along with a regex
by Aldebaran (Curate) on May 09, 2015 at 06:05 UTC

    That's got it Athanasius, thank you. Once you explained what I was doing and showed the folly of capturing what I didn't need, it all came together.

    0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

    The regexes are much tidier now. I still think I need to strip off whitespace on the RHS. I suppose I could try to roll it all into one if I get ambitious, but I think it adds legibility to make it a different step:

    for (@name) { s/\s+$//; my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1\. /; say "int is $int"; }
    for (@harm) { s/\s+$//; my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1\./; say "schmint is $int"; }

    This will do nicely for now, but I'm open to any other opinions.

      I think it adds legibility to make it a different step

      I totally agree, in the general case. In this specific case, however, the first substitution — explicitly stripping off trailing whitespace — is in fact not needed at all, because the second substitution does that anyway:

      18:14 >perl -wE "my $s = '2. Kim Washington '; $s =~ s/^\d+\.\s+( +\w+\s+\w).*$/$1. /; say qq['$s'];" 'Kim W. ' 18:14 >

      Note also that in a substitution, only the left-hand part is a regex; the right-hand part is just an (interpolated) string, so the . doesn’t need to be escaped.

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Hmmm. Not meaning to quibble, but I notice your inputs don't have the newlines that mine do, which I don't want in the output, as I want to provide my own as opposed to reproducing whatever I had in the input, which may or may not be a windows-appropriate list, as that is where these get printed off ultimately. I've read (in perlmonks) that s/\s+$//; is the best way to strip newlines and extra spacing on RHS. Without them I get this:

        0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

        Now that I look at this, it isn't at all clear to me how I would accomplish that in one step with the greediness of the .*. Indeed, I don't understand why it doesn't match .* and get discarded without the s/\s+$//;:

        for (@name) { s/\s+$//; my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1. /; say "int is $int"; } for (@harm) { s/\s+$//; my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1./; say "schmint is $int"; }
        0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

        The point about the period not needing to be escaped in the right hand part of the substitution is well-taken and reflected in the above. Thanks again.