pianomonious has asked for the wisdom of the Perl Monks concerning the following question:
Friends... I'm slowing losing my mind trying to figure out why I must capture an eacute character (as in café) via two octal regex patterns.
In short, I parse old text files and often come across some extended ASCII characters like en-dash, ellipsis, eacute, etc. which were encoded that way by some spreadsheet program like Excel or Open Office Calc.
Here are the two regexes that capture eacute for me:
if ($field =~ /\351/) { ... } if ($field =~ /\303\251/) { ... }
The first variation (octal 351) agrees with the ASCII table shown here:
https://www.ascii-code.com
My terminal program cannot display this character, and this online octal-to-ascii converter cannot either:
https://onlineasciitools.com/convert-octal-to-ascii
Yet, my Firefox browser is able to render this eacute character properly, when reading it from a text file.
The second variation (octal 303 251) is not mentioned in any ASCII table, but the eacute symbol is rendered correctly by my terminal program and can be properly converted by the octal-to-ascii converter mentioned above. As well, Firefox can render this properly from a text file.
Could someone please shed some light on what is happening?
Thanks in advance, and my apologies if I'm missing something obvious.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Two octal values for eacute?
by haukex (Archbishop) on May 23, 2020 at 21:16 UTC | |
by pianomonious (Novice) on May 23, 2020 at 22:09 UTC | |
by haukex (Archbishop) on May 24, 2020 at 14:25 UTC | |
|
Re: Two octal values for eacute?
by Anonymous Monk on May 24, 2020 at 13:28 UTC | |
by pianomonious (Novice) on Jun 07, 2020 at 16:17 UTC |