On this day, I'm thankful for PerlMonks. My brethren in the monastery have been very helpful to me this year and I'm grateful for their generosity, kindness and alacrity.

I have a Perl style question about parsing using a regular expression pattern. I'm matching mutually exclusive alternatives and then testing which of the four alternatives matched using the defined function and nested conditional statements.

use strict; use warnings; use English qw( -no_match_vars ); my $TOKEN_PATTERN = qr{ ([^\\]) # 1 Literal character (g) | \\u([0-9a-f]{4}) # 2 Universal Character Name (\u263a) | \\(["^\\]) # 3 Literal character escape sequence (\") | \\([tnfr]) # 4 Control code escape sequence (\n) }x; my %CONTROL_CODE = ( t => 0x09, n => 0x0a, f => 0x0c, r => 0x0d, ); while (my $line = <>) { chomp $line; while ($line =~ m/$TOKEN_PATTERN/g) { my $token = $LAST_PAREN_MATCH; # Decode tokens... my $code = defined $1 ? ord $token : defined $2 ? hex $token : defined $3 ? ord $token : defined $4 ? $CONTROL_CODE{$token} : undef ; printf "U+%04x\n", $code if defined $code; } }

Is there a better way to do this? What I'm doing works, but it feels clunky. Any suggestions for improvement?

Happy Thanksgiving!


In reply to Regex Parsing Style by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.