beanscake has asked for the wisdom of the Perl Monks concerning the following question:

Please i need help on this i want to avoid duplicate in an array but have tried endlessly without good result please can someone help me out on this thanks

use strict; use warnings; #use List::MoreUtils qw(uniq); // i have tried this to handle the dupl +icated emails use Data::Dumper qw(Dumper); my $Directory = $ARGV[0]; #Directory to scan for emails my $Filename =$ARGV[1]; # where to write out found emails note: avoid + forever loop make sure it's not the same directory my $success = "\n [+] $0 is Scanning For E-mails \n\n"; my $tryagain = "\n [?] perl $0 Directory fileto.txt \n\n"; if (@ARGV != 2) { print $tryagain; exit(); } else { print $success; } sub uniq { #and this to handle the duplicated emails return keys %{{ map { $_ => 1 } @_ }}; } #sub uniq { #with this to handle the duplicated emails # my %seen; # grep !$seen{$_}++, @_; #} my $total_filesscanned = 0; my $total_email = 0; my @files = grep ( -f ,<$Directory*.txt*>); #scanning directory open(my $fh, '>>', $Filename); foreach my $file (@files) { $total_filesscanned++; # begin to count numbers of file to be scanned open my $open, '<', $file or die $!; while (<$open>) { chomp; my @findemails = split(' '); my @filtered = uniq(@findemails); # meant to avoid duplicates #my @filtered = join(" ", uniq(@findemails)); also took this +aproach foreach my $emails (@filtered) { if($emails =~ /^\w+\@([\da-zA-Z\-]{1,}\.){1,}[\da-zA-Z-]{2 +,6}$/) { #grab the emails $total_email++; # begin to count emails print $fh "$emails\n"; # write the emails to file } } } close $file; # close files print "$file\n"; } close $fh; # close the file to write #my $removed = @findemails - @filtered; # am expecting it to avoid + duplicate emails but it's not working print "Files Scanned: $total_filesscanned\n"; print "E-mail Found: $total_email\n"; #print "Filtered Total: $removed\n"; print "done\n";

Replies are listed 'Best First'.
Re: Help with removing duplicates in array
by choroba (Cardinal) on Mar 27, 2015 at 12:07 UTC
    The uniq subroutine works correctly for strings, as you can easily verify:
    sub uniq { #and this to handle the duplicated emails return keys %{ { map { $_ => 1 } @_ } }; } say join ' ', uniq(qw( ABC DEF GHI JKL ABC DEF ABC ));

    Output:

    GHI DEF ABC JKL

    What do you mean by no "good results"? Note that $#filtered doesn't represent the number of duplicates removed, as the variable name suggests, but the number of unique e-mails minus 1. To get the number of removed ones, just use

    my $removed = @findemails - @filtered;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      ok thanks, i mean it's not removing duplicates as i intended it to work.. please if you can put this to a txt and run the script and see it's not removing these duplicates =>'aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net';
      The beginning of knowledge is the discovery of something we do not understand.
          Frank Herbert (1920 - 1986)

        Are you perhaps confusing a string with a list?

        use strict; use warnings; use feature 'say'; sub uniq { #and this to handle the duplicated emails return keys %{ { map { $_ => 1 } @_ } }; } my $string = 'aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa@jjj.net aaaa +@jjj.net aaaa@jjj.net aaaa@jjj.net'; say "string: ", join ' ', uniq($string); say "list: ", join ' ', uniq(split(/ /, $string));
Re: Help with removing duplicates in array
by hdb (Monsignor) on Mar 27, 2015 at 12:51 UTC

    Can you try with

    use strict; use warnings;

    As a minimum it will tell you that in the line where you use $#filtered, the underlying array @filtered is already out of scope. Probably you will see more problems with your code.

      good i have done that'
      use strict; use warnings; #use List::MoreUtils qw(uniq); // i have tried this to handle the dupl +icated emails use Data::Dumper qw(Dumper); my $Directory = $ARGV[0]; #Directory to scan for emails my $Filename =$ARGV[1]; # where to write out found emails note: avoid + forever loop make sure it's not the same directory my $success = "\n [+] $0 is Scanning For E-mails \n\n"; my $tryagain = "\n [?] perl $0 Directory fileto.txt \n\n"; if (@ARGV != 2) { print $tryagain; exit(); } else { print $success; } sub uniq { #and this to handle the duplicated emails return keys %{{ map { $_ => 1 } @_ }}; } #sub uniq { #with this to handle the duplicated emails # my %seen; # grep !$seen{$_}++, @_; #} my $total_filesscanned = 0; my $total_email = 0; my @files = grep ( -f ,<$Directory*.txt*>); #scanning directory open(my $fh, '>>', $Filename); foreach my $file (@files) { $total_filesscanned++; # begin to count numbers of file to be scanned open my $open, '<', $file or die $!; while (<$open>) { chomp; my @findemails = split(' '); my @filtered = uniq(@findemails); # meant to avoid duplicates #my @filtered = join(" ", uniq(@findemails)); also took this +aproach foreach my $emails (@filtered) { if($emails =~ /^\w+\@([\da-zA-Z\-]{1,}\.){1,}[\da-zA-Z-]{2 +,6}$/) { #grab the emails $total_email++; # begin to count emails print $fh "$emails\n"; # write the emails to file } } } close $file; # close files print "$file\n"; } close $fh; # close the file to write #my $removed = @findemails - @filtered; # am expecting it to avoid + duplicate emails but it's not working print "Files Scanned: $total_filesscanned\n"; print "E-mail Found: $total_email\n"; #print "Filtered Total: $removed\n"; print "done\n";
      The beginning of knowledge is the discovery of something we do not understand.
          Frank Herbert (1920 - 1986)

        And what do you get? Does it run w/o warnings? Do you still get duplicates? Are you aware that you only remove duplicates from the same $file but that there could be duplicates across? Change your printing to

        print $fh "$file:$emails\n"; # write the emails to file

        to check from which file your emails are retrieved from.

Re: Help with removing duplicates in array
by BillKSmith (Monsignor) on Mar 27, 2015 at 13:59 UTC

    You can use the uniq function in List::MoreUtils

    UPDATE: Sorry, I missed the comment that you had already tried this. How did it fail?

    Bill

      The fourth line of the OP's code shows that s/he already tried that.