in reply to Algorithm To Select Lines Based On Attributes
I'm not even sure where %rulelist or $rulenum are supposed to be set in the above.
Do negated defects just not add up, or do they actually remove a defect from the final count? Here I'll assume they just don't get added in.
If I'm not misunderstanding your spec, this does everything you need short of reading which defects interest you from another file:
use strict; use warnings; my @defects_to_check = qw( ATTR1 ATTR3 ATTR7 ); my $alternation = join '|(?<!!)', @defects_to_check; # previous and next lines use negative look-behind to ensure # only defects listed without '!' preceding them get matched my $regex = qr/(?<!!)$alternation/; open ( my $df, '<', 'defects_file' ) or die "can't read defects_file: +$!\n"; my $total_defects = 0; while ( <$df> ) { next unless /^DEFECTID/; my @defects_found = $_ =~ m/$regex/g; $total_defects += scalar @defects_found; print "defects found this line: ", (join ', ', @defects_found), "\n" +; print "total defects so far: $total_defects\n"; } close $df;
Given this input file for defects:
it produces this output:DEFECTID ATTR1 ATTR7 ATTR4 DEFECTID ATTR3 !ATTR1 DEFECTID ATTR2 ATTR5 ATTR3 DEFECTID ATTR4 DEFECTID ATTR3
defects found this line: ATTR1, ATTR7 total defects so far: 2 defects found this line: ATTR3 total defects so far: 3 defects found this line: ATTR3 total defects so far: 4 defects found this line: total defects so far: 4 defects found this line: ATTR3 total defects so far: 5
Now, with a million lines, I'd probably not print the new defects found and the new total for every line. If you need to know which defects had what subtotals, you could accomplish that with a hash:
use strict; use warnings; my @defects_to_check = qw( ATTR1 ATTR3 ATTR7 ); my $alternation = join '|(?<!!)', @defects_to_check; # previous and next lines use negative look-behind to ensure # only defects listed without '!' preceding them get matched my $regex = qr/(?<!!)$alternation/; open ( my $df, '<', 'defects_file' ) or die "can't read defects_file: +$!\n"; my $total_defects = 0; my %defect_subtotals; while ( <$df> ) { next unless /^DEFECTID/; my @defects_found = $_ =~ m/$regex/g; $total_defects += scalar @defects_found; $defect_subtotals{ $_ }++ for @defects_found; } close $df; print "Found $total_defects total defects.\nDefect breakdown follows:\ +n"; print $_ . ":\t\t" . $defect_subtotals{$_} . "\n" for sort keys %defec +t_subtotals;
Given the same input file as above, it produces this output:
Found 5 total defects. Defect breakdown follows: ATTR1: 1 ATTR3: 3 ATTR7: 1
A sample of input and a sample of output like this is very helpful in determining whether we're talking about the same spec. If I've made any incorrect assumptions about your spec, please give your own sample input and output so a monk can write a program to match.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Algorithm To Select Lines Based On Attributes
by ~~David~~ (Hermit) on Jan 15, 2009 at 17:58 UTC | |
by gone2015 (Deacon) on Jan 15, 2009 at 21:36 UTC |