Regexp not performed when presented utf-8 data

muba has asked for the wisdom of the Perl Monks concerning the following question:

I have a set of HTML files which have some markings in it, like {{menu}}. See it as a template file. No, don't tell me I should use a real template engine, it is just a temporary work-around.

Now the fact is, that the first line of the HTML file is #!perl parse.pl. Now, the webserver knows to send the file to perl. And it does so. And perl, and parse.pl, receive the file.

Now two things have to be done: the shebang line has to be removed, and {{menu}} has to be replaced with the menu. Here is the code.

#!perl

# Obligate
use strict;
use warnings;

binmode STDOUT;
binmode STDOUT, ":utf8";

print "Content-Type: text/html\n\n";

my $menu = do {
    open MENU, "<menu.htm";
    local $/ = 0;
    <MENU>
};

while (my $input = <>) {
    $input =~ s/\{\{menu\}\}/$menu/ieg;
    $input =~ s/^#!(.+?)$//g;
    print $input;
}
[download]

When this script has to process a normal text file, everything goes just fine. #! disappears, {{menu}} gets replaced by the menu.
Now, when the file is encoded utf-8, the script will print the page, but none of the regexes work.
As from how I interpret the documentation, regexes should work on both normal bytes and characters. Apperently, it doesn't.

What do I do wrong?
Or, what should I do different?

Oh yeah: This is perl, v5.8.3 built for MSWin32-x86-multi-thread

"2b"||!"2b";$$_="the question"
Besides that, my code is untested unless stated otherwise.
magnum unum bovem audivisti

Comment on Regexp not performed when presented utf-8 data Select or Download Code

Replies are listed 'Best First'.
Re: Regexp not performed when presented utf-8 data by Errto (Vicar) on Jun 13, 2005 at 02:01 UTC
Try changing your open line from `open MENU, "<menu.htm";` [download] to `open MENU, "<:utf8", "menu.htm";` [download] Otherwise Perl might well be treating the text read from that file as something else. Update: If what you meant is that the file being read in from standard input is UTF-8 encoded then what you need is `binmode STDIN, ":utf8";` [download] This presumably won't work if you're using multiple files through the magic filehandle. Actually, just to be safe, maybe you shouldn't use the magic filehandle.	[reply] [d/l] [select]
Re: Regexp not performed when presented utf-8 data by muba (Priest) on Jun 13, 2005 at 01:31 UTC
I thought I had the solution. At least, I got it to work, so I posted to code here (which was almost similar to the code above). But it turned out that I somehow ruined my unicode files and saved them as latin-1 :) `"2b"\|\|!"2b";$$_="the question"` Besides that, my code is untested unless stated otherwise. magnum unum bovem audivisti	[reply] [d/l]