"Perl gets stuck" means that the when the process gets to the regex line in the code it never returns and appears (according to ps) to be consuming large amounts of CPU (ie endless loop). It only takes a few dozen of these to bring the system to its knees.

I won't post the entire table because it's long and rude. However, you'll quickly get the gist.

$evil= ## list of re phrases t 'barely legal Unsensored pics rated adult site (find out|learn|discover) ANYTHING about anyone (remove.*\@dcemail\.com) bagboy\@burmeses\.net \(a\)\s*\(2\)\s*\(C\).*1618 this limited time free offer thousands of extra dollars earn a great monthly income \bfat absorber\b chain letter.*pyramid scheme pyramid scheme.*chain letter e-mail\w* work\w*\! Earn BIG \$\$\$ block this remove account quit watching others get rich bulk email works! firmer erections vaginal lubrication s e x drive Enhances Orgasms eraseus@yahoo.com over \d+ million fresh email content-\w+: .* .*name\s*=\s*".*\.(exe|scr|pif|vbs)" And it\'s 100% LEGAL! No Hidden Fees'; ## actually is 3 x this long $evil=~s/\n/\|/g; $evil=~s/ +/ /g; # ... skipping ahead ... while($l=&getnextline) { $_="$l$lastline"; ## combine this line and the last s/\s+/ /g; ## simplify white space matching $isSpam = $isSpam || /$evil/io; # ... etc ... }
What is special about some strings, I don't know. I spent some time trying to debug it, and when I found that I could just break the regular expression into parts (ie $evil1, $evil2, $evil3) and it started working again I didn't spend much more time on it. There was no clear pattern to me why it was going into the endless loop. I'd be happy to email you the complete code and test data which causes it to break. I'm running Perl 5.005_03 (freebsd) and wondered if upgrading to the new release of perl would fix it (I read it resolved some re bugs).

Does that help?


In reply to Re: Re: regex is too long by jhanna
in thread regex is too long by jhanna

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.