john.tm has asked for the wisdom of the Perl Monks concerning the following question:

I have to run a report from 2 csv files, i am stuck on the part where i want to keep only unique elements from column B, deleting any duplicates but displaying the count of how many duplicates there were. then i wish to reference this array in the 2nd csv file, is that possible? sample of data
input output james james 2 dave dave 2 mike mike 3 ken ken 3 jon jon 5 jon ken jon mike james dave mike ken jon jon
The code i have so far. I can keep only unique elements but am stuck on how to show duplicate count.
#!/usr/bin/perl use strict; use warnings; use Tk; use Tk::BrowseEntry; use POSIX 'mktime'; use POSIX 'strftime'; open(STDERR, ">&STDOUT"); ######## entry widget to get $yyyy $mmm $dd #################### +################### print "\n Select Year = $yyyy\n"; print "\n Select Month = $mmm\n"; print "\n Number of Backup Days = $dd\n"; ######## create input and output files ################## +##################### my $filerror = "\n\n! Cannot open File below, please check it exists o +r is not open already?\n"; my $OUTFILE = "C:\\Temp\\$yyyy\$mmmAudit.txt"; my $INFILE1 = "c:\\$yyyy\\$mmm\\report.csv"; my $INFILE = "c:\\$yyyy\\$mmm\\names.csv"; #Open input file for reading and Output file for writting open (INPUT,"$INFILE") or die "\n$filerror\$INFILE",,1; #open (OUTPUT,">$OUTFILE") or die "\n$filerror\n$OUTFILE",,1; my $total_names = 0; $total_names++ while (<INPUT>); my $Month_total = $total_names * $dd; ######### get total number of rows in files ################# +################# print "\n Total number of names is $total_names\n"; print "\n Total number of names is $Month_total\n"; close INPUT; open (INPUT,"$INFILE") or die "\n$filerror\$INFILE",,1; ######### keep only unique names and display number count of du +plactes ######### my %seen; while (<INPUT>) { chomp; my $line = $_; my @elements = split (",", $line); my $col_name = $elements[1]; print " $col_name \n" if ! $seen{$col_name}++; } close INPUT;

Replies are listed 'Best First'.
Re: keep only unique elements in an array displaying number of duplcates.
by GotToBTru (Prior) on Jul 28, 2014 at 04:25 UTC

    Your code could be more efficient, but examining %seen at the end shows you have the data you want. You just need to display it. keys %hashvariable produces a list of the keys of a hash.

    foreach my $name (sort keys %seen) { printf "%-6s: %-2d\n", $name, $seen{$name}; }

    Output:

    dave : 2 james : 2 jon : 5 ken : 3 mike : 3

    Or, slightly more 'perlish',

    printf "%-6s: %-2d\n",$_, $seen{$_} for (sort keys %seen);
    1 Peter 4:10
Re: keep only unique elements in an array displaying number of duplcates.
by 2teez (Vicar) on Jul 28, 2014 at 06:38 UTC

    Hi john.tm,
    While others have pointed out what you could do to get desired output, I think it wouldn't be out of place to also note a few things:

    1. ..I have to run a report from 2 csv files..
      Use tested modules, instead of hand-picking commas. In this case for CSV files use Text::CSV_XS or Text::CSV
    2. Use lexical variable name instead of barewords for file handle. Also use 3 augments open function like open my $file, '<', $filename or die '...: $!';

      Why use this
      while(<DATA>){ ... my $line = $_; ... }
      when you can do it once:
      while(my $line = <DATA>){ ... }
    3. Why loop through the file(s) twice, once to get the total and then to get the number of occurrence? When you can actually do it once like so:
      use warnings; use strict; use Data::Dumper; use List::Util qw(sum); my %seen; while (<DATA>) { chomp; $seen{$_}++; } print Dumper \%seen; my $total = 0; $total += $_ for values %seen; print $total; print sum( values %seen ); # sum from List::Util __DATA__ james dave mike ken jon jon ken jon mike james dave mike ken jon jon

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: keep only unique elements in an array displaying number of duplcates.
by Anonymous Monk on Jul 28, 2014 at 04:18 UTC
    Seems you're on the right track
    ... my $col_name = $elements[1]; # don't print # just count them # print " $col_name \n" if ! $seen{$col_name}++; $seen{$col_name}++; } while ( my ( $col_name, $times_seen ) = each %seen ) { print "$col_name: $times_seen\n"; }
    Also
    open (INPUT,"$INFILE") or die "\n$filerror\$INFILE",,1;
    What is that? Why two commas? Why "1"? Is that a Windows thing?
      The ',,1' at the end means the file is opened as a read only, so if the user leaves the file open the script still runs for them, without the pop up dialog box.

      What is that? Why two commas? Why "1"? Is that a Windows thing?

      No, its not a windows thing or a perl thing -- perl on windows is pretty much like perl on linux -- die does the same thing its documented to do

      Whatever it is, its his thing not demonstrated in the program posted