in reply to Re: Two octal values for eacute?
in thread Two octal values for eacute?
Thank you haukex!
I feel like I once skimmed that first link you posted, but it's been years ago. I do have some refreshing to do then.
You are correct - I do not know the encodings of the text files that I'm reading. They were probably exported as CSV from Excel or created by a Perl script from reading an Open Office .ods file. Tab delimited text files created differently over the course of 20+ years. That would make sense though since it's the older files that have a single byte eacute, then all of the sudden the two-byte eacute is the only variety found.
I will read through the links "best practices" and all. Much appreciated there!!
Oh, and I was looking for a small set of extended ascii characters to "flatten" (if you will) to an ascii counterpart as I could not reliably reproduce them - again pointing to the fact that they were probably encoded differently. I used a small subroutine to make two differently encoded eacutes into an 'e' to mitigate these headaches. The same sub also translated ellipses to '...', curved left/right double-quotes to straight double-quotes, long dashes to normal dashes and so on. All of these things that a spreadsheet program automatically substitutes in when you're typing. I didn't think of the encoding so much, but instead found octal regexes that could pluck out each of these characters so that I could insert what I felt was a suitable replacement. Nothing personal against the eacute!
Thank you so much for your time and expertise!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Two octal values for eacute?
by haukex (Archbishop) on May 24, 2020 at 14:25 UTC |