multiline regex: heredoc vs. reading file

bowei_99 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm having a problem with multiline regexes. When I put text in a heredoc and run a multiline regex, it matches. However, when I type that text into a text file, read it in and use the same regex, it does *not* match. Anybody have any thoughts on why? I'm running on linux, so I know it couldn't be because of MS line terminators. Below is my code and the results.

Code:

#!/usr/bin/perl -w
use strict;

my $test = <<"TEST";
Line1

Line3
TEST
;

#this worksi - it matches and prints result
if ($test =~ m{
                                \w+\n
                                \n
                                \w+
                          }msx) {
        print "Heredoc test: The line \n$test\nmatches.\n";
}

open (TEST, "testfile")
        or die "cannot open testfile testfile - $!";

print "\n==================\nReading testfile\n";


#but this doesn't show a match ... why?
while (<TEST>) {
        if (m{
                  \w+\n
                  \n
                  \w+
                 }msx) {
                print "reading file test: The line \n$_\nmatches.\n";
        }

}
close (TEST);
[download]

where testfile contains the text (with special characters shown, i.e. using 'set list' in vi):

Line1$
$
Line3$
[download]

As you can see, there's nothing special about this file.

Results:

perl test2.pl
Heredoc test: The line
Line1

Line3

matches.

==================
Reading testfile
[download]

Comment on multiline regex: heredoc vs. reading file Select or Download Code

Replies are listed 'Best First'.
Re: multiline regex: heredoc vs. reading file by japhy (Canon) on Jan 25, 2006 at 17:47 UTC
Because you've only read ONE line from the file. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply]
Re^2: multiline regex: heredoc vs. reading file by ikegami (Patriarch) on Jan 25, 2006 at 17:55 UTC
Let's elaborate. `<FILE>` in scalar context will only read one line. By default, that means it will only read until (and including) the next `\n`. How can a line match `\w+\n\n\w+` if a line can't contain `\n` other than at the end? The fix would be to read the whole file in at once, as follows: `my $text; { open(my $test_fh, '<', 'testfile) or die "Unable to open testfile: $!\n"; local $/; # Read to end of file. $text = <$test_fh>; } if ($text =~ /\w+\n\n\w+/) { print "reading file test:\n$text\nmatches.\n"; }` [download] Note: The `m` modifier on your regexp is useless since you don't use `^` or `$`. The `s` modifier on your regexp is useless since you don't use `.`. Update: If you want to find all matches, use the following: `... while ($text =~ /\w+\n\n\w+/g) { print "reading file test:\n$text\nmatches.\n"; }` [download]	[reply] [d/l] [select]
Re^2: multiline regex: heredoc vs. reading file by bowei_99 (Friar) on Jan 25, 2006 at 17:59 UTC
One line? From page 147 of 'Programming Perl' - /m Let ^ and $ match next to embedded \n. /s Let . match newline and ignore deprecated $* variable. Wouldn't that mean it would look for multiple lines?	[reply]
Re^3: multiline regex: heredoc vs. reading file by ysth (Canon) on Jan 25, 2006 at 18:06 UTC
/m changes where ^ and $ can match; /s changes what . can match. Since you don't have any of ^, $, or . in your regex, the flags do nothing. The problem is that you have a regex that only matches multiple lines, but you are trying to match each line of the file against it individually, and of course none of them do match.	[reply]
Re^3: multiline regex: heredoc vs. reading file by ikegami (Patriarch) on Jan 25, 2006 at 18:02 UTC
The problem is not with the regexp. The problem is that `$_` only contains one line. See my earlier post in this discussion for more details.	[reply] [d/l]
Re^4: multiline regex: heredoc vs. reading file by bowei_99 (Friar) on Jan 25, 2006 at 18:10 UTC
Re^5: multiline regex: heredoc vs. reading file by ikegami (Patriarch) on Jan 25, 2006 at 18:13 UTC