Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

say you need to match content from a file stored in a scalar, how would you do this if the characters you're matching breaks into a new line in spontaneous places?
abcdefg hijk lm no pqr stu vwkyz
Without changing the formatting at all, how can I match say hijklmnop ? In my original file the line breaks in pretty much any place it feels like and I need to figure out a way to still match things.

Replies are listed 'Best First'.
Re: Matching regex on multilines
by davido (Cardinal) on Nov 15, 2004 at 08:05 UTC

    So you're saying you want to match 'hijklmnop', with or without any amount of whitespace appearing between any of the characters, including the possibility of newlines?

    use strict; use warnings; my $string = <<HERE; abcdefg hijk lm no pqr stu vwkyz HERE my $find = 'hijklmnop'; my $re = join '\s*', split( //, $find ); print "Matched!\n" if $string =~ $re;

    In this situation it's not necessary to use the /s switch, but in some situations, where you're trying to match a pattern across multiple lines, you'll want to use the /s regexp modifier. It causes '.' to match any character including newlines. In the example above, however, \s is used, and that matches whitespace including newlines. This and more is discussed in perlre.

    Ultimately, it seems pretty awkward building up a regexp with \s* between each character you're trying to match. It might be wiser to preprocess your text, eliminating newlines, or insignificant whitespace.


    Dave

Re: Matching regex on multilines
by BUU (Prior) on Nov 15, 2004 at 09:40 UTC
    To highlight something davido mentions in passing, if the white space is insignicant to your pattern, that is, it doesn't matter at all and actively impedes your processing, why not simply remove the white space, then match as usual.
      As davido and BUU referred to here's an example...
      If your input file isn't too big you could do something like this...
      #!/usr/bin/perl -W use strict; my $string = <<HERE; abc defg hijk lm no pqr stu vwkyz HERE my $find = "hijklmnop"; # get a copy to avoid modifying orig $_ = $string; # remove all spaces s/\s//gs; # note the /s that davido referred to print "Matched!\n" if $_ =~ $find; # to see what the subst did. print $_;
      Not sure which is more efficient though...

      Also, it seems like you could do something like this, never modifying the orig string...
      But I couldn't get it to work. May the gurus can help...
      my $result; $_ = $string ($result) = /((\w+(?=\s*))+)/s; # ((1 char) followed by # (zero or more whitespace[don't return]) # match one or more)