Killing dupes

HTTP-404 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Killing dupes by japhy (Canon) on Aug 20, 2001 at 08:14 UTC
This has only been asked a thousand times. Try searching for "unique" or "duplicates" in the perl FAQ, or try How can I extract just the unique elements of an array?. _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply]
Re: Re: Killing dupes by HTTP-404 (Sexton) on Aug 20, 2001 at 08:39 UTC
yes i have seen it, but i still see dupes with this code `$file_vol="data-vol1.txt"; open (DB,"$file_vol") or die; @content=<DB>; $prev = 'nonesuch'; @out = grep($_ ne $prev && ($prev = $_), @content); $vallid=@out; $y=0; open("LIST", ">$file_vol.flt") or die "Can't Open File: $! +"; while($y<=$vallid){ print LIST "$out[$y]"; } $y++; }` [download] Thank You	[reply] [d/l]
Re: Re: Re: Killing dupes by blakem (Monsignor) on Aug 20, 2001 at 10:45 UTC
That code assumes that the lines are in some way ordered before you code is run. The quickest way I can see to modify your code is: `open (DB,"flat-file.txt") or die "couldn't open flat-file.txt: $!"; my %seen; @content= grep {!$seen{$_}++} (<DB>);` [download] (<DB>) creates an array, which we grep for unique elements. Well we really just throw out the duplicates because they create duplicate keys in %seen. Though it would probably be better not to slurp the entire array into memory unless you really need to. -Blake	[reply] [d/l]
Re: Re: Re: Killing dupes by Sifmole (Chaplain) on Aug 20, 2001 at 17:04 UTC
Just a comment regarding your loop: In Perl you can often avoid indexing back into the array via an increment variable. You wrote: `@out = grep($_ ne $prev && ($prev = $_), @content); $vallid=@out; $y=0; open("LIST", ">$file_vol.flt") or die "Can't Open File: $! +"; while($y<=$vallid){ print LIST "$out[$y]"; } $y++; }` [download] First, this loop would never actually end; You test $y, which starts at 0 against the $vallid which is the size of the array @out. You increment $y outside the while loop, therefor it never gets incremented inside the loop. You could rewrite this as: `@out = grep($_ ne $prev && ($prev = $_), @content); foreach my $line (@out) { print $line; }` [download] Notice that you do not need to set $vallid or $y and can avoid dealing with any kind of incrementation. Since you are however not really doing much inside the loop you could take advantage of how Perl deals with printing arrays and simply put: `@out = grep($_ ne $prev && ($prev = $_), @content); print @out;` [download] And finally at the risk of getting a little compressed you can get rid of @out and @content: `open (DB,"$file_vol") or die; print grep($_ ne $prev && ($prev = $_), <DB>); close(DB);` [download] This can be done because print can use the array returned from grep without having to store it into an array first. Also, grep can read directly from the file for you.	[reply] [d/l] [select]
Re: Killing dupes by maverick (Curate) on Aug 20, 2001 at 08:21 UTC
Here's a non-perl solution if you're on a Unx box. Since you already have the data in a file one per line you could do something like: `sort flat-file.txt \| uniq > uniqued.txt` [download] /\/\averick perl -l -e "eval pack('h','072796e6470272f2c5f2c5166756279636b672');"	[reply] [d/l]
Re: Re: Killing dupes by Hofmator (Curate) on Aug 20, 2001 at 14:29 UTC
or without the pipe, there's (normally) a command line switch for unique sorting ... `sort -u flat-file.txt > uniqued.txt` -- Hofmator	[reply] [d/l]