Removing duplicates from list

Levan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing duplicates from list by davido (Cardinal) on Oct 14, 2003 at 02:48 UTC
That's what hashes are good at. Since keys have to be unique, just plop your parsed version of $mer into a hash as the hash's key. Value is unimportant, though you could use it as a counter. Example just teaking your code slightly: `use strict; use warnings; open (MER, "Extracted.c") or die "Can't open input file. $!\n"; my %cases; for my $mer(<Mer>){ my $control; while ($control!=15){ $mer =~ s/ //; $control++; } $cases{$mer}++; } foreach $key ( keys %cases ) { print $key, "\n"; }` [download] I'm not really clear on that the inner loop is for. You only want to remove the first fifteen spaces from $mer? ...ok. I guess that's working out ok. You could alternatively use something like: `substr($mer,0,15) =~ s/ //g;` That would eliminate the while loop and $control counter. Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]
Re: Re: Removing duplicates from list by Levan (Novice) on Oct 14, 2003 at 07:20 UTC
hi, thanks for your advice, i have got it to work. by the way, the while loop is to help me align the list to the left as there are some spaces in front. your code really helps!!! a thousand thanks!!! Levan	[reply]
Re: Removing duplicates from list by Zaxo (Archbishop) on Oct 14, 2003 at 02:48 UTC
The usual way of removing duplicates from a list is to form a hash. Instead of printing in line 8, `$fnames{$mer} = '';` Then you can extract `sort keys %fname` to be printed, or for whatever other use you have. After Compline, Zaxo	[reply] [d/l]
Re: Removing duplicates from list by Roger (Parson) on Oct 14, 2003 at 02:48 UTC
All you need to do is putting the names in a hash and print them out later (the quick fix) - `open (Mer, "Extracted.c") %vars; for $mer(<Mer>){ $control=0; while ($control!=15){ $mer =~ s/ //; $control++; } $vars{$mer} = 1; } print "$_" foreach(keys %vars);` [download] Another method is to use the hash to eliminate duplicates on the fly - `open (Mer, "Extracted.c") %vars; for $mer(<Mer>){ $control=0; while ($control!=15){ $mer =~ s/ //; $control++; } next if $vars{$mer}; $vars{$mer} = 1; print $mer; }` [download] The following is how I would do this - `use strict; use IO::File; my $Mer = new IO::File "Extracted.c", "r" or die "Can not open file!"; my %vars; while my $mer (<$Mer>) { ... }` [download] Try to use IO::File to open a file in perl. It's the preferred method. ;-)	[reply] [d/l] [select]
Re: Re: Removing duplicates from list by ysth (Canon) on Oct 14, 2003 at 09:36 UTC
That last statement is pretty odd. open() has had new enhancements in just about every perl release... why would that happen if it were deprecated?	[reply]
Re: Re: Removing duplicates from list by vek (Prior) on Oct 15, 2003 at 00:32 UTC
Try to use IO::File to open a file in perl. It's the preferred method. ;-) It is? It may be one method but I don't think it's the preferred method. -- vek --	[reply]
Re: Removing duplicates from list by etcshadow (Priest) on Oct 14, 2003 at 02:59 UTC
(Not a perl answer, but still:) Pipe the output to uniq: `perl my_script.pl \| uniq` [download] Perl is a fantastic language for writing one-liners, scripts, and full-on applications, but the shell and common shell utilities are incredily useful, too. Correction: `perl my_script.pl \| sort \| uniq` [download] ------------ :Wq Not an editor command: Wq	[reply] [d/l] [select]
Re: Re: Removing duplicates from list by sgifford (Prior) on Oct 14, 2003 at 04:08 UTC
uniq will only work if the items are sorted (or at least if all of the identical items are consecutive). $ printf "a\nb\na\n" \|uniq a b a `sort \|uniq` would work, though, or `sort -u`.	[reply]
Re: Removing duplicates from list by thor (Priest) on Oct 14, 2003 at 12:50 UTC
Could you use the `ctags` program? I think that `ctags` is one of those magical tools on Unix that doesn't get enough play. thor	[reply]
Re: Removing duplicates from list by flounder99 (Friar) on Oct 14, 2003 at 15:50 UTC
davido has the best idea but if you want to use a regex you can use something like: `my $list = "RcvChar checkeof SerialOutputString SerialOutputString SerialOutputString GetFile ChksumByteByByte GetChksum SerialOutputString SerialOutputString"; while ($list =~ s/(\b\w+\b)(.*?)\s+\1\b/$1$2/s) {}; print $list;` [download] outputs : `RcvChar checkeof SerialOutputString GetFile ChksumByteByByte GetChksum` [download] -- flounder	[reply] [d/l] [select]
Re: Removing duplicates from list by vek (Prior) on Oct 15, 2003 at 00:44 UTC
You've received some good suggestions from other monks so I just had a couple of comments about your code. `open (Mer, "Extracted.c");` [download] Always try and get in the habit of checking the return value of `open`: `open (MER, "Extracted.c") \|\| die "open: Extracted.c - $!\n";` [download] Your `for` looks a little out of place for processing each line in the file. I dunno, whatever floats your boat I suppose but I think you'll probably see most people use a `while` loop: `while (<MER>) { # do stuff }` [download] -- vek --	[reply] [d/l] [select]
Re: Re: Removing duplicates from list by Levan (Novice) on Oct 16, 2003 at 02:00 UTC
hi, actually the codes that i have put up is a shorter version of what i actually had. Cos i think the codes are a bit too long and complicated to be put up on this thread, so i shorted it. thanks for the advice!!! Levan	[reply]