bowei_99 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having a problem with multiline regexes. When I put text in a heredoc and run a multiline regex, it matches. However, when I type that text into a text file, read it in and use the same regex, it does *not* match. Anybody have any thoughts on why? I'm running on linux, so I know it couldn't be because of MS line terminators. Below is my code and the results.

Code:

#!/usr/bin/perl -w use strict; my $test = <<"TEST"; Line1 Line3 TEST ; #this worksi - it matches and prints result if ($test =~ m{ \w+\n \n \w+ }msx) { print "Heredoc test: The line \n$test\nmatches.\n"; } open (TEST, "testfile") or die "cannot open testfile testfile - $!"; print "\n==================\nReading testfile\n"; #but this doesn't show a match ... why? while (<TEST>) { if (m{ \w+\n \n \w+ }msx) { print "reading file test: The line \n$_\nmatches.\n"; } } close (TEST);
where testfile contains the text (with special characters shown, i.e. using 'set list' in vi):

Line1$ $ Line3$
As you can see, there's nothing special about this file.

Results:

perl test2.pl Heredoc test: The line Line1 Line3 matches. ================== Reading testfile

Replies are listed 'Best First'.
Re: multiline regex: heredoc vs. reading file
by japhy (Canon) on Jan 25, 2006 at 17:47 UTC
    Because you've only read ONE line from the file.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart

      Let's elaborate. <FILE> in scalar context will only read one line. By default, that means it will only read until (and including) the next \n. How can a line match \w+\n\n\w+ if a line can't contain \n other than at the end?

      The fix would be to read the whole file in at once, as follows:

      my $text; { open(my $test_fh, '<', 'testfile) or die "Unable to open testfile: $!\n"; local $/; # Read to end of file. $text = <$test_fh>; } if ($text =~ /\w+\n\n\w+/) { print "reading file test:\n$text\nmatches.\n"; }

      Note: The m modifier on your regexp is useless since you don't use ^ or $. The s modifier on your regexp is useless since you don't use ..

      Update: If you want to find all matches, use the following:

      ... while ($text =~ /\w+\n\n\w+/g) { print "reading file test:\n$text\nmatches.\n"; }
      One line? From page 147 of 'Programming Perl' -

      /m Let ^ and $ match next to embedded \n.
      /s Let . match newline and ignore deprecated $* variable.

      Wouldn't that mean it would look for multiple lines?

        /m changes where ^ and $ can match; /s changes what . can match. Since you don't have any of ^, $, or . in your regex, the flags do nothing.

        The problem is that you have a regex that only matches multiple lines, but you are trying to match each line of the file against it individually, and of course none of them do match.

        The problem is not with the regexp. The problem is that $_ only contains one line. See my earlier post in this discussion for more details.