in reply to REGEX different on Linux & Win32!

I suspect diotalevi hit the nail on the head in the chatterbox. This is not a bug - in fact the regex is matching what one would expect it to match: you're searching for \n. If you type your script on Unix, line endings are \n, on Windows, they're \r\n. To get things to match correctly, regardless of OS, try using diotalevi's suggestion of first storing whatever is at the end of a line in a variable and putting that variable in the regex, or first normalize your input to either form, e.g.:
my $data=qq(One line two line three line ); $data =~ s/\r\n/\n/; # use your regex. # code is untested

CU
Robartes-

Replies are listed 'Best First'.
Re: REGEX different on Linux & Win32!
by Abigail-II (Bishop) on Feb 24, 2003 at 23:24 UTC
    Uhm, no. Line endings are always \n. On both Windows and Unix (and VMS, etc), \n translates to the appropriate byte sequence on the platform.

    \n translates to "\x0A" on Unix, and also on Windows. (It's a lower level driver that translates "\x0A" to and from "\x0D\x0A" when writing to/reading from disk.) Problems only arise when moving files between Unix and Windows platforms - unless one uses FTP's ASCII transfer.

    Abigail

Re^2: REGEX different on Linux & Win32!
by diotalevi (Canon) on Feb 24, 2003 at 23:42 UTC

    I wrote ($nl) = $data =~ m{(\15\12?|\12)} because your usage of \n is still problematic - in this case the newline value for mac, *nix and windows is handled. Anyway, the whole point to this code makes my head hurt - I'm wondering why gmpassos didn't just use one of the existing template engines.

    A /better/ idea would be to use this more like a state machine - here's a sample implementation:

    my $data = qq`\nHTML1\n<% CODE1 %>\nHTML2\n<% CODE2 %>\nHTML3\n`; my $reader = get_reader( $data ); while (my $blob = $reader->()) { print "$blob->{'type'}: $blob->{'data'}\n";; } sub get_reader { my $input = shift; my $state = 'plain'; return sub { my $temp; return unless defined $input; if ($state eq 'plain') { if ($input =~ s/(.*?)<%//s) { $state = 'code'; return { type => 'plain', data => $1 }; } else { $temp = $input; undef $input; return { type => 'plain', data => $temp }; } } else { # state eq 'code' if ($input =~ s/(.*?)\%>//s) { $state = 'plain'; return { type => 'code', data => $1 }; } else { $temp = $input; undef $input; return { type => 'code', data => $temp }; } } } } __RETURNS__ plain: HTML1 code: CODE1 plain: HTML2 code: CODE2 plain: HTML3

    Seeking Green geeks in Minnesota

Re: Re: REGEX different on Linux & Win32!
by gmpassos (Priest) on Feb 24, 2003 at 23:12 UTC
    Man! I'm looking for \n? and not \n! And if you cut the \n? form the regex the bug still exist! 2nd, the $data variable is declared in the script, and only can have \n.

    The problem is the REGEX that doesn't make the same thing on Linux and Win32. Some monks make the test, with the report script in the end of the node. The bug exist on OpenBSD too.

    Update:
    You can see in the report script in the end, that I use:

    my $data = qq`\nHTML1\n<% CODE1 %>\nHTML2\n<% CODE2 %>\nHTML3\n`;
    And I stil have reports with bugs here, on Linux and OpenBSD

    Graciliano M. P.
    "The creativity is the expression of the liberty".

Re: Re: REGEX different on Linux & Win32!
by Cabrion (Friar) on Feb 25, 2003 at 03:59 UTC
    As seen below, this wasn't actually the problem. However, I do a lot of cross platform stuff and would suggest the following regexp for removing UNIX/Windows/Mac line endings:
    my $ending =~ /\r?\n?$//;