in reply to Finding duplicates

What have you tried already? What data structures in Perl do you think may be useful here? (Hint: when you hear 'duplicates' or 'unique', you should think hash.)

I would say more, but this sounds like homework. Show us you've done some thinking first? How to ask questions the smart way.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re: Re: Finding duplicates
by Anonymous Monk on Aug 12, 2003 at 18:28 UTC
    Here is what I attempted and it is not working:
    $db = "textfile.txt"; open(DATA, "$db") or die "cant open: $!\n"; @dat = (<DATA>); close(DATA); open(DATA, "$db") || die "cant open: $!\n"; foreach $line (@dat) { if($line =~ /87/g) #I tried this just to see if I could fetch any + data in my text file { print "test\n"; } } close(DATA);
      Okay, you have combined two separate methods of reading the lines in the file. Pick one. They are functionally identical, but I recommend the latter because it doesn't require the WHOLE file to be in memory at any given time.
      ... $db = "textfile.txt"; open(DATA, $db) or die "cant open: $!\n"; @dat = <DATA>; close(DATA); foreach $line (@dat) { ... }
      ... $db = "textfile.txt"; open(DATA, $db) || die "cant open: $!\n"; foreach $line (<DATA>) { ... } close(DATA);
      The instances of ... mark the areas where you're hoping for some help. You only care about fields 2 and 5 of each line. You either want to print any line that has already been seen, or you want to print any line that has not already been seen.

      Break down the problem further.

      • You need to keep track of what's been seen in some kind of data structure. (I hinted a hash.)
      • You need to test each line in the file against the data structure to see if it's been seen before, or not.
      • You need to decide whether to print the line or not.
      • You need to add the crucial fields to the data structure so your future iterations have something to check.

      Again, I'm treating this like it's homework, and drawing you through the thinking process, rather than just handing you a solution. If you just want to be given code, I'm sure some other folks are happy to grant your wish.

      --
      [ e d @ h a l l e y . c c ]