in reply to Grabbing part of an HTML page

Why not just use one regular expression?
#!/usr/bin/perl use strict; my $start_pattern = '<!-- start section-->'; my $end_pattern = '<!-- end section -->'; my @files_to_look_in = ("/path/to/files1.html", "/path/to/files2.html" +); for(@files_to_look_in) { local $/; open(HTM_FILE, <$_) || die "Can't open file: $!"; my $file = <HTM_FILE>; if ($file =~ /$start_pattern(.*)$end_pattern/s) { print "$1"; } }

Note: If I hadn't localized $/ (the input record seperator), I would have needed to add a /s modifier to the regular expression to match on the whole file.

cLive ;-)

update: oops, not paying attention "while" is now "for". 2) made a boo-boo - see below...

Replies are listed 'Best First'.
Re: Re: Grabbing part of an HTML page
by graff (Chancellor) on Mar 29, 2004 at 01:53 UTC
    For the single regex to work, you would need to add the "s" modifier at the end, so that your ".*" doesn't stop matching at the first line-break character. Also, depending on the nature of the OP's data, it may need to be a non-greedy match:
    # ... if ( $file =~ /$start_pattern(.*?)$end_pattern/s ) { print $1; } # ...
    update: Forgot to mention: if the OP's data happens to contain more than one "start ... end" sequence within the same file, this would have to be structured as a loop -- something like:
    # while ( $file =~ /start_pattern(.*?)$end_pattern/gs ) { print "$1\n"; } #

      What does the /s modifier do if $/ is undefined? :)

      cLive ;-)

        Well, the current value of $/ has nothing to do with whether or not "." in a regex will match "\n" in a given string value; only the "s" modifier on the regex will allow "." to match "\n", no matter what $/ is. Try this out:
        #!/usr/bin/perl $/ = undef; $_ = <DATA>; while ( /begin(.*?)end/g ) { print "\n--- found without 's': ---\n$1\n"; } while ( /begin(.*?)end/gs ) { print "\n--- found using 's': ---\n$1\n"; } __DATA__ blah begin foo bar baz end begin foo2 bar2 baz2 end
Re: Re: Grabbing part of an HTML page
by pbeckingham (Parson) on Mar 28, 2004 at 22:30 UTC

    You have written another infinite loop - the while (@files_to_look_in) is always true, and no part of the code shifts or pops values from the array.