Re: Byte counts and Seek function

You're in for a world of pain if you try to mix byte counts with UTF-8, because a UTF-8 glyph may be represented by more than one byte's worth of codepoints. seek doesn't take variable-width encodings into account. It only counts bytes.

(I don't know what your utf8 function does, so I can't comment on what your call to encode does.)

Seems to me that it would be easier to use ~~pos~~ tell when you read in a sentence and keep that position around, rather than try to reconstruct it from the data you've read (and decoded, possibly normalized, et cetera).

Comment on Re: Byte counts and Seek function Select or Download Code

Replies are listed 'Best First'.
Re^2: Byte counts and Seek function by choroba (Cardinal) on Aug 27, 2013 at 22:48 UTC
Are you sure you would use pos? I always thought seek should be used with tell. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^3: Byte counts and Seek function by chromatic (Archbishop) on Aug 27, 2013 at 23:18 UTC
Yes, you're right. I was thinking of `fgetpos` in C for some reason (and even there I'd use `ftell`, so I don't know what I was thinking at all).	[reply] [d/l] [select]
Re^2: Byte counts and Seek function by AnomalousMonk (Archbishop) on Aug 27, 2013 at 22:40 UTC
utf8 (emphases added): utf8 - Perl pragma to enable/disable UTF-8 (or UTF-EBCDIC) in source code ... The "use utf8" pragma tells the Perl parser to allow UTF-8 in the program text in the current lexical scope ...	[reply]
Re^3: Byte counts and Seek function by chromatic (Archbishop) on Aug 27, 2013 at 22:42 UTC
That's the utf8 pragma. I know what it does in the posted code: nothing, because there are no non-ASCII characters appearing literally in the source code. What's the `utf8` function in the OP's code do?	[reply] [d/l]
Re^4: Byte counts and Seek function by AnomalousMonk (Archbishop) on Aug 27, 2013 at 22:50 UTC
Oops. Visually scanned for it, but didn't see the `utf8` function call the first time through. Should have used a highlighting finder! (Damned human eyes...)	[reply] [d/l]
Re^5: Byte counts and Seek function by vsespb (Chaplain) on Sep 01, 2013 at 21:01 UTC
Re^6: Byte counts and Seek function by AnomalousMonk (Archbishop) on Sep 01, 2013 at 22:00 UTC