Re^7: Extracting Data from a second line

Replies are listed 'Best First'.
Re^8: Extracting Data from a second line by ikegami (Patriarch) on Feb 16, 2006 at 19:41 UTC
Of course it is. In text mode in Windows, \x0D\x0A is two bytes, but the single character. But it's not just a Windows issue; it's also an encoding issue. If you set the encoding of DATA or of your FILE to be a multi-byte encoding such as UTF-8, then you'll have a discrepency between the number of characters and the number of bytes (even on unix) if you have any multi-byte characters in your DATA or in your FILE. The whole point is that there is no way of knowing the number of characters in a file, since there's no relation between the size of the file and the number of characters in it. It is therefore useless to "optimize" the value passed to `read`'s third parameter, a number of characters (not bytes outside of `:raw` mode). Any value at least as large as the number of bytes in the file will do nicely.	[reply] [d/l] [select]
Re^9: Extracting Data from a second line by kwaping (Priest) on Feb 16, 2006 at 20:04 UTC
So, what is the answer then? When reading DATA, do we continue to manually count the characters and use that? Or do we use a number that we know will be sufficient (1000000 or something similar) but is guaranteed too big? These are more rhetorical questions than anything. I would just like a way to find the size of DATA, programmatically.	[reply]
Re^10: Extracting Data from a second line by ikegami (Patriarch) on Feb 16, 2006 at 20:07 UTC
If you wish to know the size of DATA, in characters, use the following. It will work on with all line terminators and will all encodings. `$size_of_DATA = read(DATA, $buf='', -s DATA);` [download] It's safe to use `-s FILE` because the number of characters in a file should never be bigger than the number of bytes in a file.	[reply] [d/l] [select]