combining lists along with a regex

Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks

I'm fumbling with another regex and have been weighing the time it costs to write a proper perlmonks post as opposed to digging myself in further with errant attempts and wish I had given up earlier. I'm combining two lists:

$ cat name1.txt
0. Amber BYU
1. Kim BGSU
2. Kim Washington
$ cat harm1.txt
0. J
1. B F K
2. A I J
$
[download]

In the first list, I'd like the second word of the name to be rendered as only a first initial. I'm completely mystified by the omission of the second letter. In the second list, I want the numbers omitted, and the regex fails on the first item of the list, whether it is 0 or 54. I hope that I have enough to illustrate my intent. What follows are caller, callee, and output.

#!/usr/bin/perl -w
use strict;
use 5.010;
use lib "template_stuff";
use steps1;

say "enter basename for file";
my $word = <>;
chomp $word;

# main data structure
my %vars = (
  name  => 'name1.txt',
  harm => 'harm1.txt',
  diff => 'diff1.txt',
  word  => $word .'.rtf',
);
my $rvars  = \%vars;
my $return = pop_texts( $rvars );
say "returned was \n $$return";
my $return2 = format_texts( $rvars, $return );
__END__
[download]

sub pop_texts {
  use strict;
  use 5.010;
  use File::Slurp;

  my ($rvars) = shift;
  my %vars    = %$rvars;
  my @name   = read_file( $vars{name} );
  my @harm  = read_file( $vars{harm} );
  for (@name) {
    s/\s+$/ /;
    my $int = s/^(\d+\.)(\s+)(\w+)(\s+)(\w)(.)/$3$4$5/;
    say "int is $int";
    say "six was $6";
  }
  for  (@harm) {
    my $int = s/(^\d+\.)(\s+)(\w)(.)/was harmed by $3$4/;
    say "schmint is $int";
  }


  my $text1 = '';
  for my $i ( 0 .. $#name ) {
    $text1 = $text1 . $name[$i] . $harm[$i] . "\n";
  }
  my $reftext = \$text1;
  return $reftext;
}
[download]

$ perl hears1.pl
enter basename for file
rt7
int is 1
six was Y
int is 1
six was G
int is 1
six was a
schmint is 
schmint is 1
schmint is 1
returned was 
 Amber BU 0. J

Kim BSU was harmed by B F K

Kim Wshington was harmed by A I J
[download]

Thx for your comment.

Comment on combining lists along with a regex Select or Download Code

Replies are listed 'Best First'.

Re: combining lists along with a regex
by Athanasius (Archbishop) on May 09, 2015 at 03:12 UTC

Hello Datz_cozee75,

I'd like the second word of the name to be rendered as only a first initial. I'm completely mystified by the omission of the second letter.

For the input line 2. Kim Washington, do you want the output to be Kim W or K Washington? You say “the second word of the name,” so I’m assuming the former. Here is your regex, with the captures numbered:

my $int = s/^(\d+\.)(\s+)(\w+)(\s+)(\w)(.)/$3$4$5/;
#            1      2    3    4    5   6
[download]

For the given input line, captures are as follows:

^(2.)( )(Kim)( )(W)(a)shington
 1   2  3    4  5  6
[download]

The substitution says: match the expression in the left-hand side (regex), then replace the matched part with the right-hand side. The former (i.e., the match) is 2. Kim Wa. The latter (i.e., the replacement) is $3$4$5, which expands to Kim W. This is replaces the matched text within the string, so 2. Kim Wa becomes Kim W and the rest of the string is unaffected. And that’s why the second letter disappears!

For this substitution, I would use a simpler regex (only one capture), like this:

s/^\d+\.\s+(\w+\s+\w).*$/$1/;
[download]

For example:

13:08 >perl -wE "my $s = '2. Kim Washington'; $s =~ s/^\d+\.\s+(\w+\s+
+\w).*$/$1/; say $s;"
Kim W

13:09 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: combining lists along with a regex

by Aldebaran (Curate) on May 09, 2015 at 06:05 UTC

That's got it Athanasius, thank you. Once you explained what I was doing and showed the folly of capturing what I didn't need, it all came together.

0. Amber B. was harmed by J.
1. Kim B. was harmed by B F K.
2. Kim W. was harmed by A I J.
[download]

The regexes are much tidier now. I still think I need to strip off whitespace on the RHS. I suppose I could try to roll it all into one if I get ambitious, but I think it adds legibility to make it a different step:

  for (@name) {
    s/\s+$//;
    my $int = s/^(\d+\.\s+\w+\s+\w).*$/$1\. /;
    say "int is $int"; 
  }
[download]

  for  (@harm) {
    s/\s+$//;
    my $int = s/^\d+\.\s+(\w.*)$/was harmed by $1\./;
    say "schmint is $int";
  }
[download]

This will do nicely for now, but I'm open to any other opinions.

[reply]
[d/l]
[select]

Re^3: combining lists along with a regex

by Athanasius (Archbishop) on May 09, 2015 at 08:19 UTC

I think it adds legibility to make it a different step

I totally agree, in the general case. In this specific case, however, the first substitution — explicitly stripping off trailing whitespace — is in fact not needed at all, because the second substitution does that anyway:

18:14 >perl -wE "my $s = '2. Kim Washington      '; $s =~ s/^\d+\.\s+(
+\w+\s+\w).*$/$1. /; say qq['$s'];"
'Kim W. '

18:14 >
[download]

Note also that in a substitution, only the left-hand part is a regex; the right-hand part is just an (interpolated) string, so the . doesn’t need to be escaped.

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^4: combining lists along with a regex

by Aldebaran (Curate) on May 10, 2015 at 01:37 UTC

Re^5: combining lists along with a regex

by Athanasius (Archbishop) on May 10, 2015 at 02:54 UTC

Re: combining lists along with a regex
by CountZero (Bishop) on May 09, 2015 at 10:25 UTC

split

while (<DATA>) {
    my ( $name, $initial ) = ( split / / )[ 1, 2 ];
    $initial = substr $initial, 0, 1;
    say "$name $initial";
}

__DATA__
0. Amber BYU
1. Kim BGSU
2. Kim Washington
[download]

split

while (<DATA>) {
    chomp;
    ( undef, my $line ) = split / /, $_, 2;
    say $line;
}

__DATA__
0. J
1. B F K
2. A I J
[download]

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

[reply]
[d/l]
[select]