Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can someone give me a regex to read from my file below and slurp up everything BEFORE the code below in $1 and everything after it until the end of file in $2? This is the separator the script needs to look for
<!-- SCRIPT TEMPLATE BREAKS HERE -->
My best attempt is
use LWP::Simple; my $content = get("http://www.url.com/scripttemplate.shtml"); my $breakline = "<!-- SCRIPT TEMPLATE BREAKS HERE -->"; while ($content) { $content =~ /(.*)$breakline(.*)/); }

Replies are listed 'Best First'.
Re: Simple regex
by brian_d_foy (Abbot) on Jan 18, 2005 at 04:56 UTC

    You might also consider using split() and use its optional third parameter to tell it to only break it into two chunks.

    my( $before, $after ) = split /$breakline/, $content, 2;

    If you $breakline might have regex special characters in it, you might want to escape those with either quotemeta() or the \Q.

    my( $before, $after ) = split /\Q$breakline/, $content, 2;
    --
    brian d foy <bdfoy@cpan.org>
Re: Simple regex
by larryp (Deacon) on Jan 18, 2005 at 03:14 UTC

    Try something more like this:

    #!/usr/bin/perl use strict; #use LWP::Simple; #my $content = get("http://www.url.com/scripttemplate.shtml"); my $breakline = "<!-- SCRIPT TEMPLATE BREAKS HERE -->"; my $content = "This is the first part\n" . $breakline . "This is the l +ast part\n"; $content =~ m|(.*)$breakline(.*)|s; print "First Var contains: $1"; print "Second Var contains: $2";

    This produces

    First Var contains: This is the first part Second Var contains: This is the last part

    You don't need the while loop, since you're not checking line by line. You want to search the entire string in one pass. That's what the /s modifier is doing in the regex...it's ignoring the line endings.

    HTH,

    Larry

    UPDATE: ysth's explanation is much better than the one I provided here. Not only that, it's correct, too. :) The /s modifier is not ignoring the newlines. It's allowing the . to match newline characters as well. My haste caused me to misrepresent what was actually happening. :)

      More precisely, the /s modifier tells . to match even newline characters. Without it, your $1 and $2 will only get what is before and after $breakline on the same line.