On this day, I'm thankful for PerlMonks. My brethren in the monastery have been very helpful to me this year and I'm grateful for their generosity, kindness and alacrity.
I have a Perl style question about parsing using a regular expression pattern. I'm matching mutually exclusive alternatives and then testing which of the four alternatives matched using the defined function and nested conditional statements.
use strict; use warnings; use English qw( -no_match_vars ); my $TOKEN_PATTERN = qr{ ([^\\]) # 1 Literal character (g) | \\u([0-9a-f]{4}) # 2 Universal Character Name (\u263a) | \\(["^\\]) # 3 Literal character escape sequence (\") | \\([tnfr]) # 4 Control code escape sequence (\n) }x; my %CONTROL_CODE = ( t => 0x09, n => 0x0a, f => 0x0c, r => 0x0d, ); while (my $line = <>) { chomp $line; while ($line =~ m/$TOKEN_PATTERN/g) { my $token = $LAST_PAREN_MATCH; # Decode tokens... my $code = defined $1 ? ord $token : defined $2 ? hex $token : defined $3 ? ord $token : defined $4 ? $CONTROL_CODE{$token} : undef ; printf "U+%04x\n", $code if defined $code; } }
Is there a better way to do this? What I'm doing works, but it feels clunky. Any suggestions for improvement?
Happy Thanksgiving!
In reply to Regex Parsing Style by Jim
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |