in reply to Byte counts and Seek function
You're in for a world of pain if you try to mix byte counts with UTF-8, because a UTF-8 glyph may be represented by more than one byte's worth of codepoints. seek doesn't take variable-width encodings into account. It only counts bytes.
(I don't know what your utf8 function does, so I can't comment on what your call to encode does.)
Seems to me that it would be easier to use pos tell when you read in a sentence and keep that position around, rather than try to reconstruct it from the data you've read (and decoded, possibly normalized, et cetera).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Byte counts and Seek function
by choroba (Cardinal) on Aug 27, 2013 at 22:48 UTC | |
by chromatic (Archbishop) on Aug 27, 2013 at 23:18 UTC | |
|
Re^2: Byte counts and Seek function
by AnomalousMonk (Archbishop) on Aug 27, 2013 at 22:40 UTC | |
by chromatic (Archbishop) on Aug 27, 2013 at 22:42 UTC | |
by AnomalousMonk (Archbishop) on Aug 27, 2013 at 22:50 UTC | |
by vsespb (Chaplain) on Sep 01, 2013 at 21:01 UTC | |
by AnomalousMonk (Archbishop) on Sep 01, 2013 at 22:00 UTC |