Re^3: Leaking Regex Captures

Well, how about this....?

#!/usr/bin/perl -w
use strict;

while (<DATA>)
{
   print "testing: $_";
   chomp;
   my @pairs = m/(\d+)\s+(\w+)/g;
   print "@pairs\n\n";
}

#Prints:
#testing: beam 15 crew 5 wounded 2 critical to S.S.Kevorkian
#15 crew 5 wounded 2 critical
#
#testing: oh, my gosh, darn 5 killed 2 want_sex_change 10 drunk
#5 killed 2 want_sex_change 10 drunk
#
#testing: what a day:5 wounded 2 critical 20 crew
#5 wounded 2 critical 20 crew
#
#testing: 20 crew and 6 killed and 14 MIA
#20 crew 6 killed 14 MIA


__DATA__
beam 15 crew 5 wounded 2 critical to S.S.Kevorkian
oh, my gosh, darn 5 killed 2 want_sex_change 10 drunk
what a day:5 wounded 2 critical 20 crew
20 crew and 6 killed and 14 MIA
[download]

Comment on Re^3: Leaking Regex Captures Download Code

Replies are listed 'Best First'.
Re^4: Leaking Regex Captures by SuicideJunkie (Vicar) on Aug 05, 2009 at 17:02 UTC
That would involve a lot of post-processing to match up the numbers with the categories and filter the categories to just the valid ones ('crew', 'wounded' and 'crit'). And it can't be inserted into a larger regex match. (A lot of work, compared to just: "passing $1, $2, ... $N and some constants into the addCommand() function if and only if the regex matches") At the moment I have around 20-25 lines, each with a single regex guarding one call to addCommand(). I thus have a strong aversion to postprocessing on the matches which would cause the code to balloon up. As noted earlier in the thread, I do have a workaround which is suboptimal but adequate. Optimal would be if no post-processing was required, due to the captures not getting stomped on.	[reply]
Re^5: Leaking Regex Captures by Marshall (Canon) on Aug 06, 2009 at 22:29 UTC
That would involve a lot of post-processing to match up the numbers with the categories and filter the categories to just the valid ones ('crew', 'wounded' and 'crit'). And it can't be inserted into a larger regex match. Of course I'm just seeing one part of the overall picture, but with just a very minor modification to the code, I generate a hash table with the noun as the key and # as the value. Aside from the print stuff, this is just a few lines of code. I would expect that this is a sub that you call and re-use many times. To check if enough stuff is there, num of keys would give that. To see if one of these nouns is invalid, is just 2 lines of code (see below). Basically I would advocate some kind of data table driven approach with some rules being applied by some subs to that tabular data description. I mean if you have a validate sub that uses a table of valid nouns, then you can call that sub with other tables of valid nouns as the situation requires. validating user input is often harder than it first appears and I wouldn't be over concerned about 25 lines versus a whole page of code IF that page is clear. Clarity should be a higher priority than number of lines because this will lead to less buggy code that is easier to maintain. #!/usr/bin/perl -w use strict; my %valid = qw (crew 1 critical 1 wounded 1 killed 1); while (<DATA>) { print "testing: $_"; chomp; my %hash = reverse(m/(\d+)\s+(\w+)/g); foreach my $key(keys %hash) { print "$key $hash{$key}\n"; } my @invalid = grep {!$valid{$_}}keys %hash; print "invalid nouns: @invalid\n" if @invalid; print "\n"; } # testing: beam 15 crew 5 wounded 2 critical to S.S.Kevorkian # crew 15 # critical 2 # wounded 5 # # testing: what a day:5 wounded 2 critical 20 crew # critical 2 # crew 20 # wounded 5 # testing: 20 crew and 6 killed and 14 MIA # crew 20 # killed 6 # MIA 14 # invalid nouns: MIA __DATA__ beam 15 crew 5 wounded 2 critical to S.S.Kevorkian what a day:5 wounded 2 critical 20 crew 20 crew and 6 killed and 14 MIA [download]	[reply] [d/l]