Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I'm fumbling with another regex and have been weighing the time it costs to write a proper perlmonks post as opposed to digging myself in further with errant attempts and wish I had given up earlier. I'm combining two lists:

$ cat name1.txt 0. Amber BYU 1. Kim BGSU 2. Kim Washington $ cat harm1.txt 0. J 1. B F K 2. A I J $

In the first list, I'd like the second word of the name to be rendered as only a first initial. I'm completely mystified by the omission of the second letter. In the second list, I want the numbers omitted, and the regex fails on the first item of the list, whether it is 0 or 54. I hope that I have enough to illustrate my intent. What follows are caller, callee, and output.

#!/usr/bin/perl -w use strict; use 5.010; use lib "template_stuff"; use steps1; say "enter basename for file"; my $word = <>; chomp $word; # main data structure my %vars = ( name => 'name1.txt', harm => 'harm1.txt', diff => 'diff1.txt', word => $word .'.rtf', ); my $rvars = \%vars; my $return = pop_texts( $rvars ); say "returned was \n $$return"; my $return2 = format_texts( $rvars, $return ); __END__
sub pop_texts { use strict; use 5.010; use File::Slurp; my ($rvars) = shift; my %vars = %$rvars; my @name = read_file( $vars{name} ); my @harm = read_file( $vars{harm} ); for (@name) { s/\s+$/ /; my $int = s/^(\d+\.)(\s+)(\w+)(\s+)(\w)(.)/$3$4$5/; say "int is $int"; say "six was $6"; } for (@harm) { my $int = s/(^\d+\.)(\s+)(\w)(.)/was harmed by $3$4/; say "schmint is $int"; } my $text1 = ''; for my $i ( 0 .. $#name ) { $text1 = $text1 . $name[$i] . $harm[$i] . "\n"; } my $reftext = \$text1; return $reftext; }
$ perl hears1.pl enter basename for file rt7 int is 1 six was Y int is 1 six was G int is 1 six was a schmint is schmint is 1 schmint is 1 returned was Amber BU 0. J Kim BSU was harmed by B F K Kim Wshington was harmed by A I J

Thx for your comment.

Replies are listed 'Best First'.
Re: combining lists along with a regex
by Athanasius (Archbishop) on May 09, 2015 at 03:12 UTC

    Hello Datz_cozee75,

    I'd like the second word of the name to be rendered as only a first initial. I'm completely mystified by the omission of the second letter.

    For the input line 2. Kim Washington, do you want the output to be Kim W or K Washington? You say “the second word of the name,” so I’m assuming the former. Here is your regex, with the captures numbered:

    my $int = s/^(\d+\.)(\s+)(\w+)(\s+)(\w)(.)/$3$4$5/; # 1 2 3 4 5 6

    For the given input line, captures are as follows:

    ^(2.)( )(Kim)( )(W)(a)shington 1 2 3 4 5 6

    The substitution says: match the expression in the left-hand side (regex), then replace the matched part with the right-hand side. The former (i.e., the match) is 2. Kim Wa. The latter (i.e., the replacement) is $3$4$5, which expands to Kim W. This is replaces the matched text within the string, so 2. Kim Wa becomes Kim W and the rest of the string is unaffected. And that’s why the second letter disappears!

    For this substitution, I would use a simpler regex (only one capture), like this:

    s/^\d+\.\s+(\w+\s+\w).*$/$1/;

    For example:

    13:08 >perl -wE "my $s = '2. Kim Washington'; $s =~ s/^\d+\.\s+(\w+\s+ +\w).*$/$1/; say $s;" Kim W 13:09 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      That's got it Athanasius, thank you. Once you explained what I was doing and showed the folly of capturing what I didn't need, it all came together.

      0. Amber B. was harmed by J. 1. Kim B. was harmed by B F K. 2. Kim W. was harmed by A I J.

      The regexes are much tidier now. I still think I need to strip off whitespace on the RHS. I suppose I could try to roll it all into one if I get ambitious, but I think it adds legibility to make it a different step:

      for (@name) { s/\s+$//; my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1\. /; say "int is $int"; }
      for (@harm) { s/\s+$//; my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1\./; say "schmint is $int"; }

      This will do nicely for now, but I'm open to any other opinions.

        I think it adds legibility to make it a different step

        I totally agree, in the general case. In this specific case, however, the first substitution — explicitly stripping off trailing whitespace — is in fact not needed at all, because the second substitution does that anyway:

        18:14 >perl -wE "my $s = '2. Kim Washington '; $s =~ s/^\d+\.\s+( +\w+\s+\w).*$/$1. /; say qq['$s'];" 'Kim W. ' 18:14 >

        Note also that in a substitution, only the left-hand part is a regex; the right-hand part is just an (interpolated) string, so the . doesn’t need to be escaped.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: combining lists along with a regex
by CountZero (Bishop) on May 09, 2015 at 10:25 UTC
    I think using split will be much easier:
    while (<DATA>) { my ( $name, $initial ) = ( split / / )[ 1, 2 ]; $initial = substr $initial, 0, 1; say "$name $initial"; } __DATA__ 0. Amber BYU 1. Kim BGSU 2. Kim Washington
    A split-based solution for the second file is even easier:
    while (<DATA>) { chomp; ( undef, my $line ) = split / /, $_, 2; say $line; } __DATA__ 0. J 1. B F K 2. A I J

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics