in reply to Re: Re: Re: Re: finding duplicate data
in thread finding duplicate data

I didn't post any code. I think the code to which you refer was by borisz. Anyway, you have it a little wrong:
# loop over the array values, not indexes, making $key # refer to each one in turn foreach my $key (@hydrogen_split) { # chomp is probably not needed if your data is already # in an array. chomp removes the end-of-line character # from a line of input # use the %hydrogen_split hash to keep track of which # keys were seen how many times $hydrogen_split{$key}++ } # print in sorted order each key that was encountered more than once foreach my $key ( sort grep { $hydrogen_split{$key} > 1 } keys %hydrog +en_split ) { print "$key\n"; }
Note that I changed his != 1 to > 1. Do you understand why?

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Re: finding duplicate data
by harry34 (Sexton) on Jan 21, 2004 at 13:30 UTC
    I appreciate your help, but the I still can't get it to do what I want.
    I have 30 seperate files which are opened individually and from each file infomation is extracted, e.g for files 1 and 2:
    file 1: H(15) H(16) file 2: H(15) H(15) H(16)
    Note all the information from all 30 files is stored in @hydrogen_split.

    So interating over the information from file 1 should not output anything, but from file 2 H(15) should be displayed as it is repeated.
    With the following code H(15) H(16) is outputed ?? Is there something wrong ?
    foreach my $key (@hydrogen_split) { $hydrogen_split{$key}++; } foreach my $key ( sort grep { $hydrogen_split{$key} > 1 } keys %hydrog +en_split){ print "$key\n"; }
    Thanks a very confused Harry
      If @hydrogen_split includes information from all files then the result are the duplicates of all files too. So H15 and H16 is the result you asked for. If your desire is to get only H15 from file 2. Then you must pass the split results in @hydrogen_split for each file seperat and clear @hydrogen_split before you populate it with another files data.
      Boris