muba has asked for the wisdom of the Perl Monks concerning the following question:

I have a set of HTML files which have some markings in it, like {{menu}}. See it as a template file. No, don't tell me I should use a real template engine, it is just a temporary work-around.

Now the fact is, that the first line of the HTML file is #!perl parse.pl. Now, the webserver knows to send the file to perl. And it does so. And perl, and parse.pl, receive the file.

Now two things have to be done: the shebang line has to be removed, and {{menu}} has to be replaced with the menu. Here is the code.
#!perl # Obligate use strict; use warnings; binmode STDOUT; binmode STDOUT, ":utf8"; print "Content-Type: text/html\n\n"; my $menu = do { open MENU, "<menu.htm"; local $/ = 0; <MENU> }; while (my $input = <>) { $input =~ s/\{\{menu\}\}/$menu/ieg; $input =~ s/^#!(.+?)$//g; print $input; }
When this script has to process a normal text file, everything goes just fine. #! disappears, {{menu}} gets replaced by the menu.
Now, when the file is encoded utf-8, the script will print the page, but none of the regexes work.
As from how I interpret the documentation, regexes should work on both normal bytes and characters. Apperently, it doesn't.

What do I do wrong?
Or, what should I do different?

Oh yeah: This is perl, v5.8.3 built for MSWin32-x86-multi-thread





"2b"||!"2b";$$_="the question"
Besides that, my code is untested unless stated otherwise.
magnum unum bovem audivisti

Replies are listed 'Best First'.
Re: Regexp not performed when presented utf-8 data
by Errto (Vicar) on Jun 13, 2005 at 02:01 UTC

    Try changing your open line from

    open MENU, "<menu.htm";
    to
    open MENU, "<:utf8", "menu.htm";
    Otherwise Perl might well be treating the text read from that file as something else.

    Update: If what you meant is that the file being read in from standard input is UTF-8 encoded then what you need is

    binmode STDIN, ":utf8";
    This presumably won't work if you're using multiple files through the magic filehandle. Actually, just to be safe, maybe you shouldn't use the magic filehandle.

Re: Regexp not performed when presented utf-8 data
by muba (Priest) on Jun 13, 2005 at 01:31 UTC
    I thought I had the solution.
    At least, I got it to work, so I posted to code here (which was almost similar to the code above).

    But it turned out that I somehow ruined my unicode files and saved them as latin-1 :)




    "2b"||!"2b";$$_="the question"
    Besides that, my code is untested unless stated otherwise.
    magnum unum bovem audivisti