rocketman has asked for the wisdom of the Perl Monks concerning the following question:

Hello,
I have the following error when doing a regexp expression. Can someone point me in the right direction?

Scalar found where operator expected at (eval 304) line 1, near"s/(.{$ +len}[^&#?(w{2,7});|\xeb|\x(w{2})|\\x(w{2})|\(w{2,7})|\'|\'|'|\"|\"|\" +|\\|w{5}|[a-z]|<(.*)>|</?.*>])/$1"
The code that generates this:
if(eval("\$str =~ s/(.{\$len}[^&#?(\w{2,7});|\\xeb|\\x(\w{2})|\\\\x(\w +{2})|\\(\w{2,7})|\\'|\\\'|\'|\\\"|\\\"|\\\"|\\\\|\w{5}|[a-z]|<(.*)>|< +\/?.*>])/\$1<wbr>/mgi;")) { #ok }
Thank you,
Cosmin

20070911 Janitored by Corion: Added formatting, code tags, as per Writeup Formatting Tips

Replies are listed 'Best First'.
Re: Strange regexp problem
by moritz (Cardinal) on Aug 28, 2007 at 08:29 UTC
    You seem to have a quoting problem, because the regex doesn't throw a syntax error without the string eval (why do you need that anyway?)

    I'd suggest to write the main part of the regex outside the string eval, then you don't have to care about quoting.

    my $main_re = qr/[^&#?(\w{2,7});|\\xeb|\\x(\w{2})|\\\\x(\w{2})|\\(\w{2 +,7})|\\'|\\\'|\'|\\"|\\"|\\"|\\\\|\w{5}|a-z|<(.*)>|<\/?.*>]/

    Then you can interpolate $main_re into the regex.

    BTW your regex is absolutely horrible to read, you should use the /e modifier and insert whitespaces, line breaks and comment.

    Update: You seem to have alternatives (|) inside a (negated) char class - that makes no sense at all. Or did the markup go astray?

      You probably meant the /x modifier.
        Yes, you're right. I think "extended syntax" and that starts with an e... ;-)
Re: Strange regexp problem
by akho (Hermit) on Aug 28, 2007 at 09:14 UTC
    What should that do? It's scary. Messy. And you don't need the eval.
      I need the eval because I have to avoid breaking UTF-8 strings.
        Uhm, really?

        If you encode UTF-8 strings into perl's internal format, you don't need that kind of hack:

        use Encode qw(encode decode); my $str = # some input method here; $str = encode("utf8", $str); # do your pattern matching here print decode("utf8", $str);

        Or look

        If you have non-ascii characters in the source of your script, use utf8;. And take a look at perluniintro and perlunicode.

        Or in what way do utf-8 strings "break" without the string eval?

        As moritz explains above, you don't.

        And we may give you a much better answer if you told us what you're trying to achieve with that regexp — few here understand what \x(w{2}) is supposed to mean or why that w had to be escaped inside the eval.

        Avoid the XY problem.

Re: Strange regexp problem
by graff (Chancellor) on Aug 29, 2007 at 01:30 UTC
    I'm trying my best to read your regex "correctly" (resorting to "View Page Source" in my browser until you or one of the janitors puts it into <code>...</code> tags), but I think you basically have it all completely wrong... It seems to start like this:
    s/(.{$len}[...
    (note the square bracket) and end like this:
    ...])/$1<wbr>/mgi;
    which means that all those backslashes, parens, curlies with digits, and vertical-bar alternations are actually inside a character class. (If you don't understand what that means, you need to spend some time with perlretut and perlre.)

    To make matters worse, you even seem to be trying to use embedded square brackets, which really cannot be right -- that in itself might not be a syntax error, but it simply will not work. (The PM formatting logic turned the square brackets surrounding "a-z" into an anchor tag, which wouldn't happen if you used code tags around the code.)

    So, the regex in its original form is trash. Start over. What are you really trying to accomplish? What is the input data? What is the intended output?