in reply to Re^2: Simple pattern match failing - Possibly unicode issue
in thread Simple pattern match failing - Possibly unicode issue
In case of multi-byte encoding, what is /.{2}/ supposed to match? 2 bytes or 2 characters in the given encoding?
It's quite simple.
If you match against bytes (e.g. encoded text), it'll match two bytes.
If you match against chars (e.g. decoded text), it'll match two chars.
For example, if you match against the four bytes of the UCS-2be encoding of 'AB', you'll get the two bytes 00 and 41.
'\x{00}\x{41}\x{00}\x{42}' =~ /(.{2})/s; # Two bytes 00 and 41.
For example, if you match against the four chars NUL, A, NUL, B, you'll get the two chars NUL and A.
'\x{00}\x{41}\x{00}\x{42}' =~ /(.{2})/s; # Two chars NUL and A
|
|---|