Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've built a rather complex regex, and now it's not working for some reason...i'm sure it's something blatantly obvious I'm overlooking, but i just can't figure it out. What it's supposed to do is find a <font> html tag, and pass the color, size, and face to a subroutine that will return some code to replace the <font> tag with.

this is the regex:

$instream =~ s/<FONT(.+?)COLOR\s?=\s?('|")?(\#......)\2(.+?)SIZE\s?=\s +?('|")?(\d+)\5(.+?)FACE\s?=\s?('|")?(.+?)\8[^>]*>\s*/genStyleCSF($3,$ +6,$9)/ieg;
and this is the error message:
Backslash found where operator expected at line 257, near ")\" (Missing operator before \?) Backslash found where operator expected at line 258, near ")\" (Might be a runaway multi-line ** string starting on line 257) (Missing operator before \?) syntax error at line 257, near ")\"
like i said, it's probably something i'm overlooking, but any help would be appreciated.

Replies are listed 'Best First'.
Re: regex confusion
by wog (Curate) on Oct 07, 2001 at 08:32 UTC
    The problem is not in this line of code, which with my perl parses fine. It is likely to be something on the line before it. (Especially considering mention of a multiline *-deliminated string starting on line 257 in the error message.)

    Besides that, it would probably be better and easier to use HTML::Parser or HTML::TokeParser to do this, rather then dealing with the HTML manually.

      well, this is the line previous to the regex:
      $instream =~ s/<\/H[1-6]>\s*/\}\n/ig;
      and i can't use html parsing modules...i need to chew on the html manually to replace it with the correct coding.
        the /x switch allows white space and comments within a regular expression. This lets you break things into little pieces and comment out the pieces until the syntax errors go away.
        use strict; use warnings; use diagnostics; my $instream= 'empty'; $instream =~ s/ <FONT (.+?) COLOR\s?=\s? ('|")? (\#......) \2 (.+?) SIZE \s?=\s? ('|")? ################ (\d++)\5 #d+ not d++ (.+?) FACE \s?=\s? ('|")? (.+?) \8 [^>]*> \s* /genStyleCSF($3,$6,$9) /iegx;
        Also allow me to join the chorus suggesting a HTML parsing module....



        email: mandog

        The error you're seeing is a result of perl getting confused about which parts of your code are inside a substitution, and which are outside. Look for a regex, somewhere before line 257, where you're matching a slash, and forgot to escape it with a backslash.

        When you're matching slashes inside a regex, it's helpful to use a different delimiter for the regex, as in m,</html>, or s!</H1>!}\n! . This saves you from having to use all those backslashes.

        But, as wog said, it really would be best to use an HTML parsing module.