Friends... I'm slowing losing my mind trying to figure out why I must capture an eacute character (as in café) via two octal regex patterns.

In short, I parse old text files and often come across some extended ASCII characters like en-dash, ellipsis, eacute, etc. which were encoded that way by some spreadsheet program like Excel or Open Office Calc.

Here are the two regexes that capture eacute for me:

if ($field =~ /\351/) { ... } if ($field =~ /\303\251/) { ... }

The first variation (octal 351) agrees with the ASCII table shown here:

https://www.ascii-code.com

My terminal program cannot display this character, and this online octal-to-ascii converter cannot either:

https://onlineasciitools.com/convert-octal-to-ascii

Yet, my Firefox browser is able to render this eacute character properly, when reading it from a text file.

The second variation (octal 303 251) is not mentioned in any ASCII table, but the eacute symbol is rendered correctly by my terminal program and can be properly converted by the octal-to-ascii converter mentioned above. As well, Firefox can render this properly from a text file.

Could someone please shed some light on what is happening?

Thanks in advance, and my apologies if I'm missing something obvious.


In reply to Two octal values for eacute? by pianomonious

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.