in reply to Re^2: I sense there is a simpler way...
in thread I sense there is a simpler way...
Is gobbling an entire file into an array considered bad form? . . .
One should always be aware of the efficiency concern. If you're sure the file will never be "too big", sluurping (as it's called) shouldn't be a problem. Otherwise, you'd do well to try to do per-record reading/processing, where practical.
Calin's solution is good. If you want a little extra efficiency, you can buy it with memory, i.e. data structures. In the solution below, we maintain a separate hash for those keys which are known to be duplicates. Then, at the end, we iterate only over that hash. This has a pay-off if the number of duplicate keys is significantly smaller than the total number of keys.
(Not tested)my( %keys, %dup ); while (<STDIN>) { chomp; if ( /PROBABLECAUSE\w*\((\d+),\s*\w*,\s+(\w*)/ ) { my( $id, $key ) = ( $1, $2 ); if ( exists $dup{$key} ) # already found to be a dup { push @{ $dup{$key} }, $id; } elsif ( exists $keys{$key} ) # only seen once before { push @{ $dup{$key} }, delete($keys{$key}), $id; } else # first time seen { $keys{$key} = $id; } # check if any key has init caps (not allowed) if ( $key =~ /^[A-Z]\w*/ ) { print "Id: $id - $key\n"; } } } print "\nDuplicated keys:\n\n"; for my $key ( keys %dup ) { print "Key: $key\n"; print "\tId: $_\n" for @{$dup{$key}}; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: I sense there is a simpler way...
by HelgeG (Scribe) on Aug 23, 2004 at 09:43 UTC |