Simple regex

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Can someone give me a regex to read from my file below and slurp up everything BEFORE the code below in $1 and everything after it until the end of file in $2? This is the separator the script needs to look for

<!-- SCRIPT TEMPLATE BREAKS HERE -->
[download]

My best attempt is

use LWP::Simple;
my $content = get("http://www.url.com/scripttemplate.shtml");

my $breakline = "<!-- SCRIPT TEMPLATE BREAKS HERE -->";

while ($content)
{
  $content =~ /(.*)$breakline(.*)/);
}
[download]

Comment on Simple regex Select or Download Code

Replies are listed 'Best First'.
Re: Simple regex by brian_d_foy (Abbot) on Jan 18, 2005 at 04:56 UTC
You might also consider using split() and use its optional third parameter to tell it to only break it into two chunks. `my( $before, $after ) = split /$breakline/, $content, 2;` [download] If you `$breakline` might have regex special characters in it, you might want to escape those with either quotemeta() or the \Q. `my( $before, $after ) = split /\Q$breakline/, $content, 2;` [download] -- brian d foy <bdfoy@cpan.org>	[reply] [d/l] [select]
Re: Simple regex by larryp (Deacon) on Jan 18, 2005 at 03:14 UTC
Try something more like this: `#!/usr/bin/perl use strict; #use LWP::Simple; #my $content = get("http://www.url.com/scripttemplate.shtml"); my $breakline = "<!-- SCRIPT TEMPLATE BREAKS HERE -->"; my $content = "This is the first part\n" . $breakline . "This is the l +ast part\n"; $content =~ m\|(.)$breakline(.)\|s; print "First Var contains: $1"; print "Second Var contains: $2";` [download] This produces `First Var contains: This is the first part Second Var contains: This is the last part` [download] You don't need the while loop, since you're not checking line by line. You want to search the entire string in one pass. That's what the `/s` modifier is doing in the regex...it's ignoring the line endings. HTH, Larry UPDATE: ysth's explanation is much better than the one I provided here. Not only that, it's correct, too. :) The `/s` modifier is not ignoring the newlines. It's allowing the . to match newline characters as well. My haste caused me to misrepresent what was actually happening. :)	[reply] [d/l] [select]
Re^2: Simple regex by ysth (Canon) on Jan 18, 2005 at 04:22 UTC
More precisely, the /s modifier tells . to match even newline characters. Without it, your $1 and $2 will only get what is before and after $breakline on the same line.	[reply]