Re: Extracting DNC issues

Ok, so your situation is coming into focus finally. It sounds from your description that the three columns are ... unrelated by row? Like this? (PerlMonks just uses HTML for tables)

all clean fdnc

1111111111 9999999999 1111111111
2222222222 3333333333 2222222222
3333333333
1010101010
9999999999
8888888888

all	clean	fdnc
1111111111	9999999999	1111111111
2222222222	3333333333	2222222222
3333333333
1010101010
9999999999
8888888888

So, each column is one set of numbers, unrelated by row, and you task is to read the first set, and exclude the other two sets from it?

If so, then your program will look like this:

my (%all, %clean, %fdnc);
while (my $row = $csv->getline($input)) {
  # skip over whatever needs skipped
  ...;
  $all{$row->[0]}= 1   if $row->[0];
  $clean{$row->[1]}= 1 if $row->[1];
  $fdnc{$row->[2]}= 1  if $row->[2];
}

for (sort keys %all) {
  say $_ unless $clean{$_} or $fdnc{$_};
}
[download]

But lets talk about that file format some more. Most CSV files have meaning to the rows, where each row is one record, and each column is one attribute of that record. Your file above (if that's really what it looks like and I didn't misunderstand) is really just 3 separate files that happen to be stuffed into columns of one file.

If you use this structure instead, you would have an easier time processing it:

Number is_clean is_fdnc

1111111111 1
2222222222 1
3333333333 1
1010101010
9999999999 1
8888888888

Number	is_clean	is_fdnc
1111111111		1
2222222222		1
3333333333	1
1010101010
9999999999	1
8888888888

With a file like this, as you read each row you can immediately know which sets it was part of, and easily add an additional column. It also sets you up nicely to be able to load them into a database, which is where these things generally need to end up for use by web apps and whatever else. So, I'd recommend writing out a new file like this if your system isn't bound to the other format.

If you can't change it and really need that 4th column as an independent set, it gets a little awkward because now you need to iterate 4 sets simultaneously. The code would look like

my @all_nums= sort keys %all;
my @clean_nums= sort keys %clean;
my @fdnc_nums= sort keys %fdnc;
my @dnc_nums= grep !$clean{$_} && !$fdnc{$_}, @all_nums;

use List::Util 'max';
my $n= max($#all_nums, $#clean_nums, $#fdnc_nums, $#dnc_nums);
for (my $i= 0; i <= $n; $i++) {
   $csv->print($temp_output, [
      $all_nums[$i],
      $clean_nums[$i],
      $fdnc_nums[$i],
      $dnc_nums[$i]
   ]);
}
[download]

which seems fairly awkward, which is why I recommend changing the file format.

Comment on Re: Extracting DNC issues Select or Download Code