apt_get has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I am stuck and need your help. I have the following code:
#!/usr/bin/perl use warnings; use strict; my @names=("Sam Adams","Bud weiser","Ice House"); my ($name, $temp); foreach $name (@names){ $temp=$name; $temp=~s/(\w)(\s)(\w)/$1 . "+" . $2 . "+" . $3/eig; print "$temp\n"; $name=~s/(\w)(\s)(\w)/$1/eig; print "$name\n"; } print "\n"; __END__
The output is like:
Sam+ +Adams Samdams Bud+ +weiser Budeiser Ice+ +House Iceouse
Why is it not like
Sam+ +Adams Sam Bud+ +weiser Bud Ice+ +House Ice
Any ideas? Thanks in advance

Replies are listed 'Best First'.
Re: $1 $2 Weirdness?
by graff (Chancellor) on Jun 27, 2005 at 00:11 UTC
    This substitution:
    $name=~s/(\w)(\s)(\w)/$1/eig;
    is only removing the space and the first character of the second word. What you want instead is something like this (note that capturing parens are not needed):
    $name =~ s/\s.*//; # replace space and all following characters wit +h empty string
    And you better read up on your regex qualifiers -- the "eig" at the end of that substitution is completely unnecessary:
    • "e" means treat the replacement string as executable code -- I don't think that's what you really want
    • "i" means use case-insensitive matching -- but you have no upper or lower case letters in the right-hand side (the match) part of the s/// operator
    • "g" means apply the match/replacement globally -- but your sample input suggests that a single application will suffice.

    update: forgot to mention, the first substitution could be a lot simpler as well -- again, no capturing parens needed:

    $temp =~ s/\s/+ +/;
      Thanks graff for pointing out the error of my ways. BTW, I know what the "e", "i" and "g" qualifiers are, but thanks anyways.
Re: $1 $2 Weirdness?
by sk (Curate) on Jun 27, 2005 at 00:11 UTC
    Modifying only the second portion

    $name=~s/(\w+)(\s+)(\w+)/$1/ig; print "$name\n";
    \w looks for just one word-character. You need look for one or more

    with the fix output -

    Sam+ +Adams Sam Bud+ +weiser Bud Ice+ +House Ice

    cheers

    SK

Re: $1 $2 Weirdness?
by GrandFather (Saint) on Jun 27, 2005 at 00:11 UTC

    Because in ~s/(\w)(\s)(\w)/$1/eig (\w) matches one character, (\s) matches one character and (\w) matches one character. So in "Sam Adams" m A, gets matched and replaced with m.

    What you more likely want is:

    $name=~s/(\w*?)(\s).*/$1/eig;

    Perl is Huffman encoded by design.
      What you more likely want is:
      $name=~s/(\w*?)(\s).*/$1/eig;
      I don't think so. yet more cargo cult regexing, huh? *sigh*

      Let's strike all the unnecessary cruft.

      • He wants +, not *. What if his string started with a space?
      • The ? is unnecessary
      • I see no reason to replace \w+ with .*
      • The /e modifier is totally useless, though by accident, harmless
      • The /i modifier is useless
      • If you use .*, then the /g modifier is useless

      Granted, the modifiers you inherited from the OP, but for the rest I see very little excuse.

      IMO this is what the OP is after:

      $name=~s/(\w+)(\s)(\w+)/$1/g;
      or even
      $name=~s/(\w+)\s\w+/$1/g;
      though it would likely also do what he wants without the /g.

        Yes to all of that. I was cruising through the afternoon and spotted his immediate problem without actually engaging my brain and looking at the big picture. sigh. By the time I noticed it was too late to retract and others had pointed out the problems anyway.


        Perl is Huffman encoded by design.
      Thanks all, for your replies. For some brain dead reason, I was assuming the "\w" to match a "more than one character" word, instead of just a single character. Now I know.