Help with removing duplicates in array

beanscake has asked for the wisdom of the Perl Monks concerning the following question:

Please i need help on this i want to avoid duplicate in an array but have tried endlessly without good result please can someone help me out on this thanks

use strict;
use warnings;

#use List::MoreUtils qw(uniq); // i have tried this to handle the dupl
+icated emails
use Data::Dumper qw(Dumper);
my $Directory = $ARGV[0]; #Directory to scan for emails
my $Filename =$ARGV[1];  # where to write out found emails note: avoid
+ forever  loop make sure it's not the same directory

my $success   = "\n [+] $0 is Scanning For E-mails \n\n";
my $tryagain    = "\n [?] perl $0 Directory fileto.txt \n\n";
if (@ARGV != 2) { print $tryagain; exit(); } else { print $success; }
sub uniq {  #and this to handle the duplicated emails
    return keys %{{ map { $_ => 1 } @_ }};
}

#sub uniq {  #with this to handle the duplicated emails
 #   my %seen;
 #   grep !$seen{$_}++, @_;
#}

my $total_filesscanned = 0;
my $total_email = 0;
my @files = grep ( -f ,<$Directory*.txt*>); #scanning directory


open(my $fh, '>>', $Filename);
 foreach my $file (@files) {
$total_filesscanned++;  # begin to count numbers of file to be scanned
open my $open, '<', $file or die $!;
    while (<$open>) {
    
    
        chomp;
        my @findemails = split(' ');
        my @filtered = uniq(@findemails);  # meant to avoid duplicates
        #my @filtered  = join(" ", uniq(@findemails)); also took this 
+aproach
        
        
        foreach my $emails (@filtered) { 
          
            if($emails =~ /^\w+\@([\da-zA-Z\-]{1,}\.){1,}[\da-zA-Z-]{2
+,6}$/) { #grab the emails
    $total_email++;   # begin to count emails
 print $fh "$emails\n"; # write the emails to file
            
            } 
            }
        
            
        }
        
             
    close $file; # close files
      print "$file\n";
        
    }
    
    close $fh; # close the file to write
    
    #my $removed = @findemails - @filtered; # am expecting it to avoid
+ duplicate emails but it's not working
     print "Files Scanned: $total_filesscanned\n";
     print "E-mail Found: $total_email\n";
     #print "Filtered Total: $removed\n";
     print "done\n";
[download]

Comment on Help with removing duplicates in array Download Code

Replies are listed 'Best First'.
Re: Help with removing duplicates in array by choroba (Cardinal) on Mar 27, 2015 at 12:07 UTC
The `uniq` subroutine works correctly for strings, as you can easily verify: `sub uniq { #and this to handle the duplicated emails return keys %{ { map { $_ => 1 } @_ } }; } say join ' ', uniq(qw( ABC DEF GHI JKL ABC DEF ABC ));` [download] Output: `GHI DEF ABC JKL` [download] What do you mean by no "good results"? Note that `$#filtered` doesn't represent the number of duplicates removed, as the variable name suggests, but the number of unique e-mails minus 1. To get the number of removed ones, just use `my $removed = @findemails - @filtered;` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Help with removing duplicates in array by beanscake (Acolyte) on Mar 27, 2015 at 12:36 UTC
ok thanks, i mean it's not removing duplicates as i intended it to work.. please if you can put this to a txt and run the script and see it's not removing these duplicates =>'aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net'; The beginning of knowledge is the discovery of something we do not understand. Frank Herbert (1920 - 1986)	[reply]
Re^3: Help with removing duplicates in array by hippo (Archbishop) on Mar 27, 2015 at 13:44 UTC
Are you perhaps confusing a string with a list? `use strict; use warnings; use feature 'say'; sub uniq { #and this to handle the duplicated emails return keys %{ { map { $_ => 1 } @_ } }; } my $string = 'aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa +@jjj.net aaaa@jjj.net aaaa@jjj.net'; say "string: ", join ' ', uniq($string); say "list: ", join ' ', uniq(split(/ /, $string));` [download]	[reply] [d/l]
Re: Help with removing duplicates in array by hdb (Monsignor) on Mar 27, 2015 at 12:51 UTC
Can you try with `use strict; use warnings;` [download] As a minimum it will tell you that in the line where you use `$#filtered`, the underlying array `@filtered` is already out of scope. Probably you will see more problems with your code.	[reply] [d/l] [select]
Re^2: Help with removing duplicates in array by beanscake (Acolyte) on Mar 27, 2015 at 13:29 UTC
good i have done that' use strict; use warnings; #use List::MoreUtils qw(uniq); // i have tried this to handle the dupl +icated emails use Data::Dumper qw(Dumper); my $Directory = $ARGV[0]; #Directory to scan for emails my $Filename =$ARGV[1]; # where to write out found emails note: avoid + forever loop make sure it's not the same directory my $success = "\n [+] $0 is Scanning For E-mails \n\n"; my $tryagain = "\n [?] perl $0 Directory fileto.txt \n\n"; if (@ARGV != 2) { print $tryagain; exit(); } else { print $success; } sub uniq { #and this to handle the duplicated emails return keys %{{ map { $_ => 1 } @_ }}; } #sub uniq { #with this to handle the duplicated emails # my %seen; # grep !$seen{$_}++, @_; #} my $total_filesscanned = 0; my $total_email = 0; my @files = grep ( -f ,<$Directory.txt>); #scanning directory open(my $fh, '>>', $Filename); foreach my $file (@files) { $total_filesscanned++; # begin to count numbers of file to be scanned open my $open, '<', $file or die $!; while (<$open>) { chomp; my @findemails = split(' '); my @filtered = uniq(@findemails); # meant to avoid duplicates #my @filtered = join(" ", uniq(@findemails)); also took this +aproach foreach my $emails (@filtered) { if($emails =~ /^\w+\@([\da-zA-Z\-]{1,}\.){1,}[\da-zA-Z-]{2 +,6}$/) { #grab the emails $total_email++; # begin to count emails print $fh "$emails\n"; # write the emails to file } } } close $file; # close files print "$file\n"; } close $fh; # close the file to write #my $removed = @findemails - @filtered; # am expecting it to avoid + duplicate emails but it's not working print "Files Scanned: $total_filesscanned\n"; print "E-mail Found: $total_email\n"; #print "Filtered Total: $removed\n"; print "done\n"; [download] The beginning of knowledge is the discovery of something we do not understand. Frank Herbert (1920 - 1986)	[reply] [d/l]
Re^3: Help with removing duplicates in array by hdb (Monsignor) on Mar 27, 2015 at 13:46 UTC
And what do you get? Does it run w/o warnings? Do you still get duplicates? Are you aware that you only remove duplicates from the same `$file` but that there could be duplicates across? Change your printing to `print $fh "$file:$emails\n"; # write the emails to file` [download] to check from which file your emails are retrieved from.	[reply] [d/l] [select]
Re: Help with removing duplicates in array by BillKSmith (Monsignor) on Mar 27, 2015 at 13:59 UTC
You can use the uniq function in List::MoreUtils UPDATE: Sorry, I missed the comment that you had already tried this. How did it fail? Bill	[reply]
Re^2: Help with removing duplicates in array by Anonymous Monk on Mar 27, 2015 at 14:52 UTC
The fourth line of the OP's code shows that s/he already tried that.	[reply]