Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have data that goes into a text file from a CGI form. The problem is I dont want duplicates in the second and fifth field. The second field is the date field and the fifth field is a number that should always be unique. I will put this script in my cgi form where after the form is submitted it will open up this text file and check for this duplicate problem in the second and fifth field only.
121|20030831|lkj|lkjlkj|65 122|20030801|qq|www|43 123|20030812|qq|aah|43 124|20030812|uiy|kjh|87 125|20030812|iuy|kjh|87 #duplicate here

Replies are listed 'Best First'.
Re: Finding duplicates
by halley (Prior) on Aug 12, 2003 at 18:18 UTC
    What have you tried already? What data structures in Perl do you think may be useful here? (Hint: when you hear 'duplicates' or 'unique', you should think hash.)

    I would say more, but this sounds like homework. Show us you've done some thinking first? How to ask questions the smart way.

    --
    [ e d @ h a l l e y . c c ]

      Here is what I attempted and it is not working:
      $db = "textfile.txt"; open(DATA, "$db") or die "cant open: $!\n"; @dat = (<DATA>); close(DATA); open(DATA, "$db") || die "cant open: $!\n"; foreach $line (@dat) { if($line =~ /87/g) #I tried this just to see if I could fetch any + data in my text file { print "test\n"; } } close(DATA);
        Okay, you have combined two separate methods of reading the lines in the file. Pick one. They are functionally identical, but I recommend the latter because it doesn't require the WHOLE file to be in memory at any given time.
        ... $db = "textfile.txt"; open(DATA, $db) or die "cant open: $!\n"; @dat = <DATA>; close(DATA); foreach $line (@dat) { ... }
        ... $db = "textfile.txt"; open(DATA, $db) || die "cant open: $!\n"; foreach $line (<DATA>) { ... } close(DATA);
        The instances of ... mark the areas where you're hoping for some help. You only care about fields 2 and 5 of each line. You either want to print any line that has already been seen, or you want to print any line that has not already been seen.

        Break down the problem further.

        • You need to keep track of what's been seen in some kind of data structure. (I hinted a hash.)
        • You need to test each line in the file against the data structure to see if it's been seen before, or not.
        • You need to decide whether to print the line or not.
        • You need to add the crucial fields to the data structure so your future iterations have something to check.

        Again, I'm treating this like it's homework, and drawing you through the thinking process, rather than just handing you a solution. If you just want to be given code, I'm sure some other folks are happy to grant your wish.

        --
        [ e d @ h a l l e y . c c ]

Re: Finding duplicates
by flounder99 (Friar) on Aug 12, 2003 at 19:56 UTC
    Maybe this code will start you on the right direction
    use strict; my $newdata = '124|20030812|uiy|kjh|87'; my $db = "textfile.txt"; open (DATAFILE, "+<$db") or die $!; my $key = join("|", (split /\|/, $newdata)[1,4]); while (<DATAFILE>) { chomp; if (join("|", (split /\|/)[1,4]) eq $key) { close DATAFILE; exit; } } print DATAFILE $newdata,"\n"; close DATAFILE;
    $newdata will be added to the text file if it is not already there. I would probably put this in a sub and do a return instead of an exit. This consumes no extra memory no matter how big textfile.txt gets.

    --

    flounder

      thanks
Re: Finding duplicates
by johndageek (Hermit) on Aug 12, 2003 at 18:24 UTC
    So tell me, What does your current code, or code attempt, look like?
    Please post it.

    If you have no code, try looking up the following: split(), arrays, sorting. And if you feel like fun, try hashes.

    Enjoy!
    John