in reply to Self-Populating Tree Data Structure
Just concatenate your three fields together and use it as a hash key. The de-dup becomes a one-pass, O(1) lookup process:
my %lookup; open IN, '<', ... open OUT, '>', ... while( <IN> ) { my( $file, $line, $rule ) = m[...(...)...(...)...(...)] or warn("Bad data: '$_' line: $.\n") and next; if( exists $lookup{ join $;, $file, $line, $rule } ) { ## Duplicate, discard next; } else { ## New, record. $lookup{ join $;, $file, $line, $rule } = 1; print OUT; ## output $_ } } close IN; close OUT;
|
|---|