in reply to I sense there is a simpler way...
Here is a begining, this only parses the input data once. It could be more efficient, possibly the regexp could be turned into a split on /,\s*|\(/ and then a test made on the first part of the split, not sure what sort of data you are trying not to match but possibly something like this
You may want to fix keys starting with a cap so they match duplicates without. I put the processing in the loop reading STDIN, this was so I could test it easily by bashing a few lines in by hand. May also be a little more efficient as the script starts working as soon as it has its first line rather than waiting till all is in.my ($test, $id, $key)=split /,\s*|\(/, $_, 4; next unless $test=/PROBABLECAUSE\w*/;
#!/use/your/bin/perl -w use strict; my($line_count, @lines, %dup, @capkeys)=0; while (<STDIN>) { if (m/PROBABLECAUSE\w*\((\d+),\s*\w*,\s+(\w*)/) { my ($id, $key)= ($1, $2); # check if any key has init caps (not allowed) if ($key =~ m/^[A-Z]\w*/) { push @capkeys, "$line_count: $id - $key\n"; # you may want to fix up caps here before you store # the key so Keysomething matches keysomething } if (exists $dup{$key}) { print "Duplicates found\n"; print "$dup{$key}->[0]: $dup{$key}->[1]\n"; print "$line_count: $_\n"; } else { $dup{$key}=[$line_count, $_]; # store line } $line_count++; } } print "keys with initial caps\n" if @capkeys; foreach (@capkeys) {print}
In the initial caps test is the \w* really required or would if ($key =~ m/^[A-Z]/ be OK ? What if the entire key is in upper case or will that never happen ?my ($test, $id, undef, $key)=split /,\s*|\(/, $_, 5; next unless $test=/PROBABLECAUSE\w*/;
#!/your/perl -w use strict; my($line_count, %dup)=0; while (<STDIN>) { my ($test, $id, undef, $key)=split /,\s*|\(/, $_, 5; next unless $test=/PROBABLECAUSE\w*/; # If i may be so bold... $key = lc $key; if (exists $dup{$key}) { print "Duplicates found\n"; print "$dup{$key}->[0]: $dup{$key}->[1]"; print "$line_count: $_"; } else { $dup{$key}=[$line_count, $_]; # store No and line } $line_count++; }
|
|---|