Re: REGEX different on Linux & Win32!
by Abigail-II (Bishop) on Feb 24, 2003 at 23:34 UTC
|
Well, you didn't include the two results you were getting
and which one you though is correct. It always helps if
you tell us what you get, we're not omniscient.
Anyway, I refuse to believe this is a OS issue. But what I
do believe is that's Perl version issue. Given that the code
is in the file x.pl:
$ /opt/perl/5.8.0/bin/perl x.pl
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML2
>>>
<<< CODE2 >>>
<<< HTML3 >>>
$ /opt/perl/5.6.1/bin/perl x.pl
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML3 >>>
$ /opt/perl/5.6.0/bin/perl x.pl
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML2
>>>
<<< CODE2 >>>
<<< HTML3 >>>
$ /opt/perl/5.005_03/bin/perl x.pl
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML2
>>>
<<< CODE2 >>>
<<< HTML3 >>>
So, it's my guess that the Linux box you tried this on has
perl 5.6.1 installed, and the Windows box has either a later
or an older version of Perl installed.
Abigail
| [reply] [d/l] [select] |
|
|
Sorry, my mistake! On Win32, where is right, with Perl 5.6.1 I get:
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML2
>>>
<<< CODE2 >>>
<<< HTML3 >>>
On Linux, with Perl 5.6.1:
<<<
HTML1
>>>
<<< CODE1 >>>
<<< HTML3 >>>
Graciliano M. P.
"The creativity is the expression of the liberty". | [reply] [d/l] [select] |
Re: REGEX different on Linux & Win32!
by robartes (Priest) on Feb 24, 2003 at 22:56 UTC
|
I suspect diotalevi hit the nail on the head in the chatterbox. This is not a bug - in fact the regex is matching what one would expect it to match: you're searching for \n. If you type your script on Unix, line endings are \n, on Windows, they're \r\n. To get things to match correctly, regardless of OS, try using diotalevi's suggestion of first storing whatever is at the end of a line in a variable and putting that variable in the regex, or first normalize your input to either form, e.g.:
my $data=qq(One line
two line
three line
);
$data =~ s/\r\n/\n/;
# use your regex.
# code is untested
CU Robartes- | [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
I wrote ($nl) = $data =~ m{(\15\12?|\12)} because your usage of \n is still problematic - in this case the newline value for mac, *nix and windows is handled. Anyway, the whole point to this code makes my head hurt - I'm wondering why gmpassos didn't just use one of the existing template engines. A /better/ idea would be to use this more like a state machine - here's a sample implementation:
my $data = qq`\nHTML1\n<% CODE1 %>\nHTML2\n<% CODE2 %>\nHTML3\n`;
my $reader = get_reader( $data );
while (my $blob = $reader->()) {
print "$blob->{'type'}: $blob->{'data'}\n";;
}
sub get_reader {
my $input = shift;
my $state = 'plain';
return sub {
my $temp;
return unless defined $input;
if ($state eq 'plain') {
if ($input =~ s/(.*?)<%//s) {
$state = 'code';
return { type => 'plain',
data => $1 };
} else {
$temp = $input;
undef $input;
return { type => 'plain',
data => $temp };
}
}
else { # state eq 'code'
if ($input =~ s/(.*?)\%>//s) {
$state = 'plain';
return { type => 'code',
data => $1 };
} else {
$temp = $input;
undef $input;
return { type => 'code',
data => $temp };
}
}
}
}
__RETURNS__
plain:
HTML1
code: CODE1
plain:
HTML2
code: CODE2
plain:
HTML3
Seeking Green geeks in Minnesota | [reply] [d/l] [select] |
|
|
Man! I'm looking for \n? and not \n! And if you cut the \n? form the regex the bug still exist! 2nd, the $data variable is declared in the script, and only can have \n.
The problem is the REGEX that doesn't make the same thing on Linux and Win32. Some monks make the test, with the report script in the end of the node. The bug exist on OpenBSD too.
Update:
You can see in the report script in the end, that I use:
my $data = qq`\nHTML1\n<% CODE1 %>\nHTML2\n<% CODE2 %>\nHTML3\n`;
And I stil have reports with bugs here, on Linux and OpenBSD
Graciliano M. P.
"The creativity is the expression of the liberty".
| [reply] [d/l] |
|
|
As seen below, this wasn't actually the problem. However, I do a lot of cross platform stuff and would suggest the following regexp for removing UNIX/Windows/Mac line endings:
my $ending =~ /\r?\n?$//;
| [reply] [d/l] |
Re: REGEX different on Linux & Win32!
by cfreak (Chaplain) on Feb 24, 2003 at 23:06 UTC
|
I tested your snipet on Mandrake 9 without a problem. Most likely the problem is caused by something in your data. I would encourage you to post a sample of the actual data here.
One thing that might help you if you are parsing HTML would be to look into HTML::TokeParser on CPAN. It can recognize those CODE sections as well., Update: see tachyon's post below
Hope that helps
Chris
Lobster Aliens Are attacking the world!
| [reply] |
|
|
It can recognize those CODE sections as well.
Actually, good though HTML::TokeParser is it does not recognise them
$data = q`
HTML1
<% CODE1 %>
HTML2
<% CODE2 %>
HTML3
<p>foo</p>
`;
use Data::Dumper;
use HTML::TokeParser;
my $parser = HTML::TokeParser->new( \$data );
while ( my $token = $parser->get_token() ) {
print Dumper($token) if $token->[1] =~ m/<%/;
}
__DATA__
$VAR1 = [
'T',
'<% CODE1 %>
HTML2
',
''
];
$VAR1 = [
'T',
'<% CODE2 %>
HTML3
',
''
];
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
| [reply] [d/l] |
|
|
| [reply] [d/l] [select] |
|
|
A reply falls below the community's threshold of quality. You may see it by logging in. |