HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Thank You very much for your replies, yesterday. Now i have other question. I have big flat file dbase, that stores 1 record on 1 line, there about 50 000 lines in total, and some are repeated frequently, how can i make so that they come up only once. So far I have only this:
open (DB,"flat-file.txt") or die; @content=<DB>;
Thank You very much in advance

Replies are listed 'Best First'.
Re: Killing dupes
by japhy (Canon) on Aug 20, 2001 at 08:14 UTC
      yes i have seen it, but i still see dupes with this code
      $file_vol="data-vol1.txt"; open (DB,"$file_vol") or die; @content=<DB>; $prev = 'nonesuch'; @out = grep($_ ne $prev && ($prev = $_), @content); $vallid=@out; $y=0; open("LIST", ">$file_vol.flt") or die "Can't Open File: $! +"; while($y<=$vallid){ print LIST "$out[$y]"; } $y++; }
      Thank You
        That code assumes that the lines are in some way ordered before you code is run.

        The quickest way I can see to modify your code is:

        open (DB,"flat-file.txt") or die "couldn't open flat-file.txt: $!"; my %seen; @content= grep {!$seen{$_}++} (<DB>);
        (<DB>) creates an array, which we grep for unique elements. Well we really just throw out the duplicates because they create duplicate keys in %seen.

        Though it would probably be better not to slurp the entire array into memory unless you really need to.

        -Blake

        Just a comment regarding your loop: In Perl you can often avoid indexing back into the array via an increment variable. You wrote:
        @out = grep($_ ne $prev && ($prev = $_), @content); $vallid=@out; $y=0; open("LIST", ">$file_vol.flt") or die "Can't Open File: $! +"; while($y<=$vallid){ print LIST "$out[$y]"; } $y++; }
        First, this loop would never actually end; You test $y, which starts at 0 against the $vallid which is the size of the array @out. You increment $y outside the while loop, therefor it never gets incremented inside the loop. You could rewrite this as:
        @out = grep($_ ne $prev && ($prev = $_), @content); foreach my $line (@out) { print $line; }
        Notice that you do not need to set $vallid or $y and can avoid dealing with any kind of incrementation. Since you are however not really doing much inside the loop you could take advantage of how Perl deals with printing arrays and simply put:
        @out = grep($_ ne $prev && ($prev = $_), @content); print @out;
        And finally at the risk of getting a little compressed you can get rid of @out and @content:
        open (DB,"$file_vol") or die; print grep($_ ne $prev && ($prev = $_), <DB>); close(DB);
        This can be done because print can use the array returned from grep without having to store it into an array first. Also, grep can read directly from the file for you.
Re: Killing dupes
by maverick (Curate) on Aug 20, 2001 at 08:21 UTC
    Here's a non-perl solution if you're on a Un*x box. Since you already have the data in a file one per line you could do something like:
    sort flat-file.txt | uniq > uniqued.txt

    /\/\averick
    perl -l -e "eval pack('h*','072796e6470272f2c5f2c5166756279636b672');"

      or without the pipe, there's (normally) a command line switch for unique sorting ... sort -u flat-file.txt > uniqued.txt

      -- Hofmator