Re: Re: Re: Re: Re: finding duplicate data

I didn't post any code. I think the code to which you refer was by borisz. Anyway, you have it a little wrong:

# loop over the array values, not indexes, making $key
# refer to each one in turn
foreach my $key (@hydrogen_split) {

# chomp is probably not needed if your data is already
# in an array.  chomp removes the end-of-line character
# from a line of input

# use the %hydrogen_split hash to keep track of which
# keys were seen how many times
   $hydrogen_split{$key}++ 
}

# print in sorted order each key that was encountered more than once
foreach my $key ( sort grep { $hydrogen_split{$key} > 1 } keys %hydrog
+en_split ) {
    print "$key\n";
}
[download]

Note that I changed his != 1 to > 1. Do you understand why?

Comment on Re: Re: Re: Re: Re: finding duplicate data Download Code

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: finding duplicate data by harry34 (Sexton) on Jan 21, 2004 at 13:30 UTC
I appreciate your help, but the I still can't get it to do what I want. I have 30 seperate files which are opened individually and from each file infomation is extracted, e.g for files 1 and 2: `file 1: H(15) H(16) file 2: H(15) H(15) H(16)` [download] Note all the information from all 30 files is stored in @hydrogen_split. So interating over the information from file 1 should not output anything, but from file 2 H(15) should be displayed as it is repeated. With the following code H(15) H(16) is outputed ?? Is there something wrong ? `foreach my $key (@hydrogen_split) { $hydrogen_split{$key}++; } foreach my $key ( sort grep { $hydrogen_split{$key} > 1 } keys %hydrog +en_split){ print "$key\n"; }` [download] Thanks a very confused Harry	[reply] [d/l] [select]
Re: Re: Re: Re: Re: Re: Re: finding duplicate data by borisz (Canon) on Jan 21, 2004 at 13:50 UTC
If `@hydrogen_split` includes information from all files then the result are the duplicates of all files too. So H15 and H16 is the result you asked for. If your desire is to get only H15 from file 2. Then you must pass the split results in `@hydrogen_split` for each file seperat and clear `@hydrogen_split` before you populate it with another files data. Boris	[reply] [d/l] [select]