rickman1 has asked for the wisdom of the Perl Monks concerning the following question:

I am populating an array with many many account values successfully, I then attempt to populate a diff array with only the unique values of the first array, yet when I print it out all values print out and not just the unique ones. Not sure what I am doing wrong. It is as if 'uniq' is not working for me. Thanks in advance for any help.

use strict; use warnings; use String::Util 'trim'; use List::MoreUtils qw(uniq); open IN, "<$ARGV[0]" or die "Could not open input file '$ARGV[0]' $!"; my $row; my $val; my $counter = 0; my @trnAccounts; my @unique_accounts; while ($row = <IN>) { $val = substr($row,0,1); @trnAccounts = substr($row,1050,50); @trnAccounts = trim (@trnAccounts); @unique_accounts = uniq (@trnAccounts); foreach (@unique_accounts) { $counter++; print $counter, ":"; print @unique_accounts, "\n"; #print $_, "\n"; this also prints all values } }

Replies are listed 'Best First'.
Re: Trying to print out only unique array values-
by genio (Beadle) on Aug 16, 2016 at 17:35 UTC
    Hi Rickman1! Let me start by cleaning up a little bit and then explaining after.
    #!/usr/bin/env perl use strict; use warnings; use List::Util 1.45 qw(uniq); my $char_at=10; my $num_chars=50; open my $fh, '<:encoding(UTF-8)', $ARGV[0] or die "Could not open '$ARGV[0]' $!"; my @trnAccounts; while (my $line = <$fh>) { chomp $line; next unless $line && length($line) >= $char_at; push @trnAccounts, substr($line, $char_at, $num_chars); } my @unique_accounts = uniq(@trnAccounts); for my $i (0..$#unique_accounts) { print $i, ":", $unique_accounts[$i], "\n"; } print "total: ", scalar(@trnAccounts), "\n"; print "uniq : ", scalar(@unique_accounts), "\n";
    Some of the ways you were using trim() and putting elements into an array were off. Once cleared up, the code above should hopefully be a little easier to follow. Also, later versions of List::Util contain a uniq function. Use that instead. Hope that helps.

      Dude you rock! For the most part I modified your example so that it almost works as expected. I had to blow away the UTF-8 encoding part due to errors. Then I modified the '$i (1..$#unique_accounts)' so output would not be zero based. Then I included some added conditions so that it would not process header & trailer records. So output is:
      1:#########
      2:#########
      3:#########
      4:#########
      5:#########
      total: 370659
      uniq : 6
      Sorry but I cannot show actual values. Notice there are 5 unique values (which is correct) yet the 'uniq' counter reads 6. Also total lines processed equals 370659, it should only equal 370657. It is obviously processing header & trailer records. Here is your example with my changes:

      use strict; use warnings; use List::MoreUtils qw(uniq); my $char_at=10; my $num_chars=50; open my $fh, $ARGV[0] #open my $fh, '<:encoding(UTF-8)', $ARGV[0] or die "Could not open '$ARGV[0]' $!"; my @trnAccounts; while (my $line = <$fh>) { my $val = substr($line, $char_at, $num_chars); if ($val eq 'H' or $val eq 'T') { my $output = $line; }else { chomp $line; next unless $line && length($line) >= $char_at; push @trnAccounts, substr($line, $char_at, $num_chars); } } my @unique_accounts = uniq(@trnAccounts); for my $i (1..$#unique_accounts) { print $i, ":", $unique_accounts[$i], "\n"; } print "total: ", scalar(@trnAccounts), "\n"; print "uniq : ", scalar(@unique_accounts), "\n";
        Hi Rickman1!
        In general a great Monk question has some data and actual code that the Monks can run. If you have private data that cannot be disclosed publically, then "dummy up" something that is an acurate representation of the actual data, but is "fake". Use made up names and account numbers, "Luke Skywalker" or whatever.

        This code has some issues that I see:

        while (my $line = <$fh>) { my $val = substr($line, $char_at, $num_chars); if ($val eq 'H' or $val eq 'T') { my $output = $line; }else { chomp $line; next unless $line && length($line) >= $char_at; push @trnAccounts, substr($line, $char_at, $num_chars); } }
        First, my $output = $line; will never be executed. And even if it is, it will do absolutely nothing. You cannot declare a "my" variable conditionally and use it elsewhere. Use it immediately or not at all. So this whole "if" clause is "nonsense".

        Will the condition if ($val eq 'H' or $val eq 'T') ever be satisfied? I think not. $val looks like it is a string of 50 characters, starting at $char_at. This will never equal a single character comparison. Some regex might work, but single character, I think not. You have not chomped the line endings, and this line ending in $val will prevent the "match".

        In the "else" clause, the appears to be confusion. next unless $line, will always work! Right up front, while (my $line = <$fh>) says that $line is true otherwise the loop doesn't proceed. push @trnAccounts, substr($line, $char_at, $num_chars);. Well that substr is just $val.

        I suggest that you have another "go" at this. Generate say 10 example lines, show your code to process those lines and how it fails. Keep simplifying the example until you cannot reproduce the problem any more. Make it as simple as possible. This process may help you discover your own problem.

        Close. There were a few issues with your updates. So, I think I covered those and commented the code enough to make it more understandable here:
        #!/usr/bin/env perl use strict; use warnings; use List::MoreUtils qw(uniq); my $char_at=10; # Character to start grabbing data in the line my $num_chars=50; # Number of characters to grab in the line my @trnAccounts; # our array to store results # open our file open my $fh, '<', $ARGV[0] or die "Could not open '$ARGV[0]' $!"; # go through our file, line by line while (my $line = <$fh>) { chomp $line; # trim off trailing newline character first # OR, you could trim the line using regular expressions # $line =~ s/\A\s*//; # trim beginning # $line =~ s/\s*\z//; # trim end # OR, you could trim using Scalar::Util's trim or something # $line = trim($line); # skip this line completely if it doesn't contain the info we want next unless $line && length($line) >= $char_at; # grab the uppercased version of the first character in the line my $first_char = uc(substr($line,0,1) || ''); # skip this line if that first character's an H or T next if ($first_char eq 'H' or $first_char eq 'T'); # otherwise, push a portion of the line onto our array push @trnAccounts, substr($line, $char_at, $num_chars); } # get a unique list of info we stored. my @unique_accounts = uniq(@trnAccounts); # arrays are zero-based. for my $i (0..$#unique_accounts) { # still zero-based, but display it as 1-based print $i+1, ":", $unique_accounts[$i], "\n"; } # print out some totals print "total: ", scalar(@trnAccounts), "\n"; print "uniq : ", scalar(@unique_accounts), "\n";
        I hope that's more clear. Please speak up if something doesn't make sense. I apologize for not documenting the first attempt a bit more clearly.
Re: Trying to print out only unique array values-
by toolic (Bishop) on Aug 16, 2016 at 17:22 UTC
    uniq works as I expect it to. It only prints out one "c", not two:
    use strict; use warnings; use List::MoreUtils qw(uniq); my @trnAccounts = qw(a b c c d); my @unique_accounts = uniq(@trnAccounts); foreach (@unique_accounts) { # print @unique_accounts, "\n"; print $_, "\n"; } __END__ a b c d

    Basic debugging checklist

    http://sscce.org

      Unfortunately it did not work for me. It'll work on something simple like that for me too, but I am reading in thousands of lines and well, I am sure something was off. Thanks for your help tho!

Re: Trying to print out only unique array values-
by BillKSmith (Monsignor) on Aug 17, 2016 at 13:48 UTC
    Note that the word "unique" is slightly ambiguous. When a value is repeated, should it be included once, or not at all? All answers so far have assumed the former.
    Bill