remove duplicates with hash and grep

Smith has asked for the wisdom of the Perl Monks concerning the following question:

Still pretty weak on using hashes so I am hoping someone could please help. Im opening a file with domains in them and then pulling out only the root domain. Then I am trying to use the hash and grep to remove any duplicates but all I get back are blank lines.

#!/usr/bin/perl 

$upload = "/var/tmp/work/upload"; 
$work = "/var/tmp/work/"; 
$input3 = "$upload/domain.csv"; 

system ("dos2unix $input3"); 

open (IN,"$input3"); 
open (OUT,">>$work/local.rules"); 
while (<IN>) { 
         chomp();

         if ($_ =~ /^.+\.([A-Za-z0-9-_]+\.[A-Za-z]{2,})$/){ 
                 $domain = $1; 
                 %seen = ();
                 @unique = grep { ! $seen{ $domain }++ } @array; 
                 print "@unique\n"; 
    }
}
[download]

Comment on remove duplicates with hash and grep Download Code

Replies are listed 'Best First'.
Re: remove duplicates with hash and grep by LanX (Saint) on Dec 22, 2014 at 21:29 UTC
@array is empty cause never populated. `warnings` and `strict` would have told you... Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :)}	[reply]
Re^2: remove duplicates with hash and grep by Smith (Initiate) on Dec 22, 2014 at 22:04 UTC
Change it to this and am now getting the list but it has not removed the duplicates. Still lost `#!/usr/bin/perl use warnings; use strict; my $upload = "/var/tmp/work/upload"; my $work = "/var/tmp/work/STAP-domain_clean_project"; my $input3 = "$upload/domain.csv"; system ("dos2unix $input3"); open (IN,"$input3"); open (OUT,">>$work/local.rules"); while (<IN>) { chomp(); if ($_ =~ /^.+\.([A-Za-z0-9-_]+\.[A-Za-z]{2,})$/){ my @array = $1; my %seen = (); my @unique = grep { ! $seen{ @array }++ } @array; print "@unique\n"; } }` [download]	[reply] [d/l]
Re^3: remove duplicates with hash and grep by GotToBTru (Prior) on Dec 22, 2014 at 22:18 UTC
Your code redefines %seen and @array each time you read a line from your file. You get duplicates because your hash is always empty when you test it for a value. The following will produce a list of the unique values from IN: `my (%seen,@unique); while (<IN>) { chomp; $seen{$1}++ if ($_ =~ /^.+\.([A-Za-z0-9-_]+\.[A-Za-z]{2,})$/); } @unique = keys %seen; printf "%s, ",$_ for @unique; print "\n";` [download] Updated for readability and coherence and typos (is it Monday already?) 1 Peter 4:10	[reply] [d/l]
Re^4: remove duplicates with hash and grep by Smith (Initiate) on Dec 23, 2014 at 15:19 UTC
Re^3: remove duplicates with hash and grep by AnomalousMonk (Archbishop) on Dec 23, 2014 at 04:25 UTC
... now getting the list but it has not removed the duplicates. ... `my @array = $1;` On every iteration through the `while`-loop, this statement creates a new array and initializes it with a single element: the string that was captured to `$1`. This string is, of course, unique!	[reply] [d/l] [select]
Re: remove duplicates with hash and grep by toolic (Bishop) on Dec 22, 2014 at 21:29 UTC
Tip #1 from the Basic debugging checklist: warnings `Name "main::array" used only once: possible typo at ..` [download] Your code never populates @array (it is empty).	[reply] [d/l]
Re: remove duplicates with hash and grep by Smith (Initiate) on Dec 22, 2014 at 21:30 UTC
Also if I print after I declare $domain I do get the list of domains, so I know the data is there.	[reply]