Jim has asked for the wisdom of the Perl Monks concerning the following question:
On this day, I'm thankful for PerlMonks. My brethren in the monastery have been very helpful to me this year and I'm grateful for their generosity, kindness and alacrity.
I have a Perl style question about parsing using a regular expression pattern. I'm matching mutually exclusive alternatives and then testing which of the four alternatives matched using the defined function and nested conditional statements.
use strict; use warnings; use English qw( -no_match_vars ); my $TOKEN_PATTERN = qr{ ([^\\]) # 1 Literal character (g) | \\u([0-9a-f]{4}) # 2 Universal Character Name (\u263a) | \\(["^\\]) # 3 Literal character escape sequence (\") | \\([tnfr]) # 4 Control code escape sequence (\n) }x; my %CONTROL_CODE = ( t => 0x09, n => 0x0a, f => 0x0c, r => 0x0d, ); while (my $line = <>) { chomp $line; while ($line =~ m/$TOKEN_PATTERN/g) { my $token = $LAST_PAREN_MATCH; # Decode tokens... my $code = defined $1 ? ord $token : defined $2 ? hex $token : defined $3 ? ord $token : defined $4 ? $CONTROL_CODE{$token} : undef ; printf "U+%04x\n", $code if defined $code; } }
Is there a better way to do this? What I'm doing works, but it feels clunky. Any suggestions for improvement?
Happy Thanksgiving!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex Parsing Style
by ikegami (Patriarch) on Nov 25, 2010 at 23:30 UTC | |
by Jim (Curate) on Nov 26, 2010 at 00:35 UTC | |
by aquarium (Curate) on Nov 26, 2010 at 03:12 UTC | |
by Jim (Curate) on Nov 26, 2010 at 05:13 UTC | |
by aquarium (Curate) on Nov 28, 2010 at 23:44 UTC | |
by ikegami (Patriarch) on Nov 26, 2010 at 00:39 UTC | |
by Jim (Curate) on Nov 26, 2010 at 06:39 UTC | |
by ikegami (Patriarch) on Nov 26, 2010 at 06:50 UTC | |
|