<clippy>It looks like you are trying to build a negated character class. Would you like some help?</clippy>

You have the syntax for including several ranges within a character class wrong. Simple rule: Within a character class, [, |, and whitespace (even with /x) are literal, and ] terminates the class (except in first position). So keep those square braces out until constructing the class, and keep the pipes and whitespace out, period.

Ah, but then, clippy may have been wrong? What you are trying to do is combine (and then negate) multi-character sequences, right?

Oh my. Combining is easy. What you have even works, though I'd personally use qr//x instead:

my $shiftjis = qr{ [\x30-\x39] | [\x41-\x59] | [\x61-\x7A] | [\x82-\x83][\x3F-\xFE] | [\x88][\x9E-\xFE] | [\x89-\xE9][\x3F-\xFE] | [\xEA][\x3F-\x9F] }x;

Negating is another subject alltogether, since there is more than one set of semantics for such a negation. I'm not entirely sure which makes more sense here ... if any! Here's one example/guess though:

while(<STDIN>){ chomp(); # Oops, nope -- variable-length lookbehind: # s/(?!$shiftjis).(?<!$shiftjis)//ogx; # This runs, but doesn't do the job: # s/(?!$shiftjis).//ogx; # This should work: s/(?!$shiftjis). (?<![\x30-\x39\x41-\x59\x61-\x7A]) (?<![\x82-\x83][\x3F-\xFE] |[\x88][\x9E-\xFE] |[\x89-\xE9][\x3F-\xFE] |[\xEA][\x3F-\x9F]) //ogx; print $_."\n"; }

But do you have to read STDIN as bytes? If you could read it as characters (in whatever encoding this is; see Encode), you'd be spared this mess.

print "Just another Perl ${\(trickster and hacker)},"
The Sidhekin proves Sidhe did it!


In reply to Re: regex: how to negate a set of character ranges? by Sidhekin
in thread regex: how to negate a set of character ranges? by kettle

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.