in reply to regexp over multiple lines

Yes, it's possible. First you have to slurp in your input (ie, don't read line by line). Next, you need to set the /s modifier, and possibly /m for your regexp. /s tells the RE engine that '.' should match any character including a newline. /m tells the RE engine that $ and ^ should match at the beginning and ending of lines rather than beginning and ending of the string (that's what \A and \Z are for).

Also keep in mind that quantifiers such as * and + are greedy, so .* probably won't do what you want it to do when you hand it the following:

<tag>asdf</tag><tag>ghjkl</tag>

Unless what you want is for it to be greedy.

my $string = "<tag>asdf</tag><tag>ghjkl</tag>"; if( $string =~ m{<tag>(.*)</tag>} ) { print $1, "\n"; } __END__ asdf</tag><tag>ghjkl

Woops!

Now introducing perlre! :)

Expect some replies telling you to use a proper XML parser, such as XML::Twig, XML::Parser, XML::LibXML, XML::Simple, etc. And they're right. Better to let a well tested solution do the work for you.


Dave

Replies are listed 'Best First'.
Re^2: regexp over multiple lines
by liverpaul (Acolyte) on Aug 03, 2011 at 07:29 UTC
    Thanks for the reply :-) I'm a novice in Perl so I went about this in a very different way! Instead of using an XML parser (which I wasn't aware of), I processed each XML file by replacing each ">" with ">\n" so that I ended up with a file with multiple lines instead of everything on just one line. Since my program has to parse data from XML files and normal HTML files, I would like to avoid using an XML parser because my code is set up the wrong way. I'm going to have to read your advice a few more times because it doesn't immediately make sense to me. I'll try a few tests in my program to see if I can get things working and increase my understanding. I'll post back here for further help :-)

      The monks have been helping you solve the individual problem that you've defined, but missed the very important point you made here - that your files are XML. Using an XML parser, whether XML::Twig, XML::Simple, XML::LibXML, or something else, is THE way to process XML files (and to head off the argument: Yes, even you can use CPAN). Trying to do so via regular expressions is simply madness. You're recreating tools that have already been created and debugged and replacing them with half baked code that will no doubt miss many edge cases.

Re^2: regexp over multiple lines
by liverpaul (Acolyte) on Aug 03, 2011 at 07:56 UTC
    I will fix the greedy/non-greedy issue. I process the file line by line in a for loop. This is necessary because I sometimes need to check a few lines ahead. If I'm forced to process the file line by line instead of "slurp"ing the file, does that mean I can't regexp over multiple lines?
      If I'm forced to process the file line by line instead of "slurp"ing the file, does that mean I can't regexp over multiple lines?
      If you are visiting your relatives one by one, are you having a family reunion?

      Unless you concatenate the lines yourself, it's not going to work. There's no magic in the regular expression engine that says, "hmmm, I'm not going to match this line, I'm just going to read one more line from the input to see whether it matches now".

        I have to visit my relatives one by one, but sometimes I need to check ahead on Grandma because she's old :-)

        I think I'll have a look at concatenating the lines myself, that might be what I'm after. Thanks.