in reply to Re^2: regex: how to negate a set of character ranges?
in thread regex: how to negate a set of character ranges?
You might have to give up on using combined character ranges altogether if you want to process the encoded data directly, and inverting ranges will be especially annoying. I mean, you could possibly match like this /([\x00-\x40][\x56-\x90]|[\x50-\x60][\x56-\x90])*/ (numbers made up), but you can't (easily) invert that match. Also, keep in mind that your regexes might shift (eh) off their alignment since shift-jis has 1 and multi-byte characters - meaning [\x00-\x40] might match both the first and/or later byte(s) of any character.
I think it's still likely that using the internal perl multi-byte encoding (i.e. utf-8) will be a lot easier, but it depends on what you're trying to do exactly.
|
|---|