in reply to Re^3: Building a new file by filtering a randomized old file on two fields
in thread Building a new file by filtering a randomized old file on two fields

If the file uses UTF-8 encoding, then multibyte characters wont prevent finding the end of line. Or, more correctly, multibyte codepoints. The first byte of of any codepoint is always less than 128. Additional bytes are always greater than 127. Thus you can always find the next codepoint even if your random position lands you in the middle of a codepoint. Perl will be able to find the end of line. Other multibyte encodings are likely to cause problems.

(FYI, there are multi codepoint characters, but you only need to worry about that after you find a whole line of text - if at all.)

  • Comment on Re^4: Building a new file by filtering a randomized old file on two fields