Strange regexp problem

rocketman has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Strange regexp problem by moritz (Cardinal) on Aug 28, 2007 at 08:29 UTC
You seem to have a quoting problem, because the regex doesn't throw a syntax error without the string eval (why do you need that anyway?) I'd suggest to write the main part of the regex outside the string eval, then you don't have to care about quoting. `my $main_re = qr/[^&#?(\w{2,7});\|\\xeb\|\\x(\w{2})\|\\\\x(\w{2})\|\\(\w{2 +,7})\|\\'\|\\\'\|\'\|\\"\|\\"\|\\"\|\\\\\|\w{5}\|a-z\|<(.)>\|<\/?.>]/` [download] Then you can interpolate `$main_re` into the regex. BTW your regex is absolutely horrible to read, you should use the `/e` modifier and insert whitespaces, line breaks and comment. Update: You seem to have alternatives (`\|`) inside a (negated) char class - that makes no sense at all. Or did the markup go astray? Perl 6 in German -- Difficult Sudoku	[reply] [d/l] [select]
Re^2: Strange regexp problem by akho (Hermit) on Aug 28, 2007 at 09:19 UTC
You probably meant the `/x` modifier.	[reply] [d/l]
Re^3: Strange regexp problem by moritz (Cardinal) on Aug 28, 2007 at 09:37 UTC
Yes, you're right. I think "extended syntax" and that starts with an e... ;-)	[reply]
Re: Strange regexp problem by akho (Hermit) on Aug 28, 2007 at 09:14 UTC
What should that do? It's scary. Messy. And you don't need the eval.	[reply]
Re^2: Strange regexp problem by rocketman (Initiate) on Aug 28, 2007 at 10:11 UTC
I need the eval because I have to avoid breaking UTF-8 strings.	[reply]
Re^3: Strange regexp problem by moritz (Cardinal) on Aug 28, 2007 at 10:44 UTC
Uhm, really? If you encode UTF-8 strings into perl's internal format, you don't need that kind of hack: `use Encode qw(encode decode); my $str = # some input method here; $str = encode("utf8", $str); # do your pattern matching here print decode("utf8", $str);` [download] Or look If you have non-ascii characters in the source of your script, `use utf8;`. And take a look at perluniintro and perlunicode. Or in what way do utf-8 strings "break" without the string eval? Perl 6 in German -- Difficult Sudoku	[reply] [d/l] [select]
Re^3: Strange regexp problem by akho (Hermit) on Aug 28, 2007 at 11:52 UTC
As moritz explains above, you don't. And we may give you a much better answer if you told us what you're trying to achieve with that regexp — few here understand what `\x(w{2})` is supposed to mean or why that `w` had to be escaped inside the eval. Avoid the XY problem.	[reply] [d/l] [select]
Re: Strange regexp problem by graff (Chancellor) on Aug 29, 2007 at 01:30 UTC
I'm trying my best to read your regex "correctly" (resorting to "View Page Source" in my browser until you or one of the janitors puts it into <code>...</code> tags), but I think you basically have it all completely wrong... It seems to start like this: `s/(.{$len}[...` [download] (note the square bracket) and end like this: `...])/$1<wbr>/mgi;` [download] which means that all those backslashes, parens, curlies with digits, and vertical-bar alternations are actually inside a character class. (If you don't understand what that means, you need to spend some time with perlretut and perlre.) To make matters worse, you even seem to be trying to use embedded square brackets, which really cannot be right -- that in itself might not be a syntax error, but it simply will not work. (The PM formatting logic turned the square brackets surrounding "a-z" into an anchor tag, which wouldn't happen if you used code tags around the code.) So, the regex in its original form is trash. Start over. What are you really trying to accomplish? What is the input data? What is the intended output?	[reply] [d/l] [select]