in reply to Re: Matching regular expression over multiple lines
in thread Matching regular expression over multiple lines

Thank you for the welcome and the very clear explanation! This worked brilliantly for me (and probably saved me a lot of time in the future).

Just out of curiosity, I went back and tried to solve the original problem with the regex after your tip about the "while" element only reading one line. You were absolutely right, and I should have been writing the following:

open( FILE, "C:/Users/li/data_collection/posts/165644996453.html" ) || + die "couldn't open\n"; while ( <FILE> ) { $data .= $_; } if ( $data =~ m/(?<=<p>)(.*)(?=<\/p>\s+<footer>)/g ) { print "$1\n"; }
(code taken from dsb's answer in Re: Apply regex to entire file, not just individual lines ?).

Thanks again!

Replies are listed 'Best First'.
Re^3: Matching regular expression over multiple lines
by haukex (Archbishop) on Oct 16, 2017 at 11:59 UTC
    while ( <FILE> ) { $data .= $_; }

    That'll work, but it's not particularly efficient because it chops the file up line by line and then puts it back together. You could use the same "slurp" idiom I showed (do { local $/; <$fh> }), which will read the entire file in one go, which is more efficient.

    an alternative to using a regex [quoted from here]

    I just wrote about this in general here: Parsing HTML/XML with Regular Expressions

      Ah, that makes more sense! Thanks a lot.