in reply to Re^2: unicode normalization layer
in thread unicode normalization layer

This is certainly the simplest approach I've seen so far, and I'll definitely keep it in mind for future use. However, I'm currently using something closer to graff's approach. I need to have a count of the invalid items encountered in the input stream, so I've defined a CHECK function to be used by :encoding(utf8) that ticks up a counter of the number of bad things found and then returns the unicode WTF?! character to replace it in the input stream.

As for the relative speed of getline (<>) and read block, I was recently working with a system where benchmarking showed the speed difference between the two approaches was quite substantial -- 7-8 times difference -- which is why I wanted to avoid getline in this case, especially since my processing needs are not specifically line-oriented.

--DrWhy

"If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."