cbolcato has asked for the wisdom of the Perl Monks concerning the following question:

I am having a problem pattern matching the end of a string when reading from STDIN and using a chomp. If I add a newline to string or remove the chomp it works fine. I have used $ \z \Z.
#!/usr/bin/perl print "get some string: "; chomp($string = <STDIN>); #$string = $ARGV[0]; #chomp($string); $string =~ m/\/([[:alnum:]]+)_.*\.(.+)$/; print "$1\n"; $type = $2; print "$type\n"; $string =~ m/(.+)\.${type}$/; #$string =~ m/(.+)\.${type}\Z/; #$string =~ m/(.+)\.${type}\z/; #$string =~ m/(.+)\.${type}/; print "$1\n"; exit 0; OUTPUT - which is wrong get some string: /xxxx/yyyy/ZZZ_xxxx.CCC ZZZ CCC ZZZ BUT IF I: #!/usr/bin/perl #print "get some string: "; #chomp($string = <STDIN>); $string = $ARGV[0]; chomp($string); $string =~ m/\/([[:alnum:]]+)_.*\.(.+)$/; print "$1\n"; $type = $2; print "$type\n"; $string =~ m/(.+)\.${type}$/; #$string =~ m/(.+)\.${type}\Z/; #$string =~ m/(.+)\.${type}\z/; #$string =~ m/(.+)\.${type}/; print "$1\n"; exit 0; OUTPUT - IS CORRECT ZZZ CCC /xxxx/yyyy/ZZZ_xxxx

Replies are listed 'Best First'.
Re: perl pattern match for end of string using STDIN and chomp
by ikegami (Patriarch) on Oct 08, 2009 at 17:49 UTC
    You are mistaken. Your first program works fine for the input you specified:
    $ perl 800041a.pl get some string: /xxxx/yyyy/ZZZ_xxxx.CCC ZZZ CCC /xxxx/yyyy/ZZZ_xxxx

    Note that you probably want to use \Q$type\E in patterns, not just $type, since $type doesn't contains a regex pattern but text to match literally.

Re: perl pattern match for end of string using STDIN and chomp
by kennethk (Abbot) on Oct 08, 2009 at 17:37 UTC
    Running the first example code you posted and entering your string does not yield your posted output; rather, it yields the correct output from your second code sample. Have you tested your posted code as written?

    #!/usr/bin/perl print "get some string: "; chomp($string = <STDIN>); #$string = $ARGV[0]; #chomp($string); $string =~ m/\/([[:alnum:]]+)_.*\.(.+)$/; print "$1\n"; $type = $2; print "$type\n"; $string =~ m/(.+)\.${type}$/; #$string =~ m/(.+)\.${type}\Z/; #$string =~ m/(.+)\.${type}\z/; #$string =~ m/(.+)\.${type}/; print "$1\n"; exit 0;

    ~/sandbox$ perl junk.pl get some string: /xxxx/yyyy/ZZZ_xxxx.CCC ZZZ CCC /xxxx/yyyy/ZZZ_xxxx
      perl 5.8.0 run on Linux 2.4.21-40.ELsmp DOES NOT WORK, you are correct I ran on perl 5.8.7 on my windows machine and it worked fine, going to try and find a more current version on our Linux machine and try and find if it is a perl version problem or if it is directly related to linux

        Works for me in 5.8.0

        get some string: /xxxx/yyyy/ZZZ_xxxx.CCC ZZZ CCC /xxxx/yyyy/ZZZ_xxxx

        And there's no reason it shouldn't. $ not matching the end of the string would have been caught by tests. Your build is very broken if the last pattern doesn't match given the specified input.

        Ditto on ikegami above, tested v5.8.8 built for x86_64-linux-gnu-thread-multi on Ubuntu 8.04 LTS as well as Windows. What happens when you run this?

        #!/usr/bin/perl print "get some string: "; $dropped = chop($string = <STDIN>); print ord $dropped, " ", ord $/, "\n"; #$string = $ARGV[0]; #chomp($string); $string =~ m/\/([[:alnum:]]+)_.*\.(.+)$/; print "$1\n"; $type = $2; print "$type\n"; $string =~ m/(.+)\.${type}$/; #$string =~ m/(.+)\.${type}\Z/; #$string =~ m/(.+)\.${type}\z/; #$string =~ m/(.+)\.${type}/; print "$1\n"; exit 0;
Re: perl pattern match for end of string using STDIN and chomp
by AnomalousMonk (Archbishop) on Oct 08, 2009 at 21:48 UTC
    I tend to agree with the suggestions of others that the second regex in the chomped <STDIN> section of the OPed code is simply failing to match, and the previous value of  $1 is persisting.

    Try inserting the statement
        print qq{\$1 reset: '$1'} if 'foo' =~ /(foo)/;
    between the first and second regex in that section and seeing what happens. If the second  'ZZZ' becomes  'foo' you will know what is happening (although not why the second regex fails to match, which I cannot understand myself).

      This "previous value of $1" can be problematic. One of the practices that I often use in my code is to NOT use $1 or $2, etc. I like to assign $1 right away to a variable that has more contextual meaning (like $name, $cust_id) or whatever. Why have $name = $1;?

      One way to do this is illustrated below, put the match into a list context and use list slice to get $1,$2, etc. If "$1" is undef, then in this case $thing gets undef, not the previous value of $1.

      print "match failed\n" unless $string =~ m/(.+)\.BBB$/; print "dollar $1:\n"; #prints previous $1 value my ($thing) = ($string =~ m/(.+)\.BBB$/)[0]; print "thing =$thing\n"; #$thing is undef
      it seems pretty clear that the $1 in the code is falling through since the second regex is not matching, so i will use some of the code examples to avoid that in the future. However, the big question is why the second regex does not match with a chomped <STDIN> but will match a chomped $ARGV[0], it only appears to be an issue with 5.8.0, and appears to have been fixed in atleast 5.8.3, can anyone explain this behavior, may need some ammo to get our Linux support to upgrade our standard Perl version
Re: perl pattern match for end of string using STDIN and chomp
by Marshall (Canon) on Oct 08, 2009 at 21:06 UTC
    I did some testing on my Perl 5.10 Win XP system. At first all appeared to be ok. Then I tried to replicate the output of the poster and found that I could do it if the second match failed.

    It appears that if match fails, $1 remains the same as it was, i.e. $1 is only valid if match succeeds.

    I experimented with  ${type} vs just $type and found that both worked on my Perl version. I am wondering if something about this on Perl 5.8.0 is somehow different? That this second match is not succeeding and old $1 is still there? I would suggest trying some of my experiments below and see what happens on the OP's system. I think the second match is failing for some reason and the "old $1" is getting printed.

    #!/usr/bin/perl -w print "get some string: "; ($string = <STDIN>); #note: chomp not necessary, $ should count \n as #end of string. should work with or without chomp. $string =~ m/\/([[:alnum:]]+)_.*\.(.+)$/; print "dollar 1: $1\n"; $type = $2; print "type: $type\n"; #something weird here.... $string =~ m/(.+)\.BBB$/; #get some string: /xxxx/yyyy/ZZZ_xxxx.CCC #dollar 1: ZZZ #type: CCC #dollar 1:ZZZ print "match failed\n" unless $string =~ m/(.+)\.BBB$/; #$string =~ m/(.+)\.${type}$/; #ok #$string =~ m/(.+)\.$type$/; #ok also print "dollar 1:$1\n"; exit 0; __END__ This is with the match failed code: C:\TEMP>regextest.pl get some string: /xxxx/yyyy/ZZZ_xxxx.CCC dollar 1: ZZZ type: CCC match failed dollar 1:ZZZ This is from: #$string =~ m/(.+)\.${type}$/; #ok #$string =~ m/(.+)\.$type$/; #ok also C:\TEMP>regextest.pl get some string: /xxxx/yyyy/ZZZ_xxxx.CCC dollar 1: ZZZ type: CCC dollar 1:/xxxx/yyyy/ZZZ_xxxx C:\TEMP>perl -v This is perl, v5.10.0 built for MSWin32-x86-multi-thread (with 5 registered patches, see perl -V for more detail)