how to remove duplicate based on the first column value

tariqahsan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: how to remove duplicate based on the first column value by arthas (Hermit) on Jun 10, 2003 at 15:03 UTC
Try the following code. The output is on STDOUT, but you should have no problem modifying ti to output to a file if you need it. The script uses a %seen hash as lookup table to avoid print out duplicates. `#!/usr/bin/perl -Tw use strict; open (my $myfile, "./prova.txt"); my %seen; while (<$myfile>) { chomp; my ($c1, $c2) = split(/\\|/); unless (defined $seen{$c1}) { print "$c1$c2\n"; $seen{$c1} = 1; } } close ($myfile);` [download] Hope this helps! Michele.	[reply] [d/l]
Re: Re: how to remove duplicate based on the first column value by tariqahsan (Beadle) on Jun 10, 2003 at 19:53 UTC
Michele, Thanks! your script work. - Tariq	[reply]
Re: how to remove duplicate based on the first column value by Enlil (Parson) on Jun 10, 2003 at 15:08 UTC
Heres one way. `use strict; use warnings; my %seen; while ( <DATA> ) { print unless $seen{(split /\\|/)[0]}++; } __DATA__ 123\|abc 123\|cde 234\|efg 456\|hij` [download] -enlil	[reply] [d/l]
Re: Re: how to remove duplicate based on the first column value by EvdB (Deacon) on Jun 10, 2003 at 15:17 UTC
Which could also be done as the following one liner: `perl -e' while(<>) {print unless $s{(split /\\|/)[0]}++;}' < infile > o +utfile` [download] where `infile` is a file with the values to parse and `outfile` is where you want the results. Indeed: `perl -i.bak -e' while(<>) {print unless $s{(split /\\|/)[0]}++;}' infil +e` [download] will edit `infile` in situ (putting backup in infile.bak). Update: I have assumed that you are using a shell with redirection, such as bash. I have been told off for this sort of assumption before so best to make it clear. --`tidiness is the memory loss of environmental mnemonics`	[reply] [d/l] [select]
Re^3: how to remove duplicate based on the first column value by revdiablo (Prior) on Jun 10, 2003 at 16:59 UTC
Another nice perl command line switch is `-n`. This builds the `while(<>)` loop for you. Example: `perl -i.bak -ne 'print unless $s{(split /\\|/)[0]}++' infile` Note: more of these can be found in perlrun.	[reply] [d/l] [select]
Re: how to remove duplicate based on the first column value by cees (Curate) on Jun 10, 2003 at 15:55 UTC
`my %seen; my @data = grep { chomp; not $seen{(split /\\|/)[0]}++ } <DATA>;` [download] This solution will load the entire file in at once, so if you are using large files this would not be the most memory efficient solution. Whenever you need to remove elements from a list of items think `grep`.	[reply] [d/l] [select]
Re: how to remove duplicate based on the first column value by cbro (Pilgrim) on Jun 10, 2003 at 15:05 UTC
`#!/usr/local/bin/perl my %hash; open (F,"testers.txt"); my @array = <F>; close(F); foreach (@array) { my ($key, $value) = split(/\\|/); next if (exists $hash{$key}); $hash{$key} = $value; } # use this to verify while (my ($fkey,$fval) = each %hash) { print "$fkey\|$fval\n"; }` [download] I hope I didn't just do somebody's homework <g>	[reply] [d/l]
Re: how to remove duplicate based on the first column value by perlguy (Deacon) on Jun 10, 2003 at 18:08 UTC
You could also do it with `substr()`: `my %seen; print join '', grep { !$seen{substr($_, 0, 1)}++ } <DATA>;` [download]	[reply] [d/l] [select]