in reply to Extracting DNC issues
Ok, so your situation is coming into focus finally. It sounds from your description that the three columns are ... unrelated by row? Like this? (PerlMonks just uses HTML for tables)
all | clean | fdnc |
---|---|---|
1111111111 | 9999999999 | 1111111111 |
2222222222 | 3333333333 | 2222222222 |
3333333333 | ||
1010101010 | ||
9999999999 | ||
8888888888 |
So, each column is one set of numbers, unrelated by row, and you task is to read the first set, and exclude the other two sets from it?
If so, then your program will look like this:
my (%all, %clean, %fdnc); while (my $row = $csv->getline($input)) { # skip over whatever needs skipped ...; $all{$row->[0]}= 1 if $row->[0]; $clean{$row->[1]}= 1 if $row->[1]; $fdnc{$row->[2]}= 1 if $row->[2]; } for (sort keys %all) { say $_ unless $clean{$_} or $fdnc{$_}; }
But lets talk about that file format some more. Most CSV files have meaning to the rows, where each row is one record, and each column is one attribute of that record. Your file above (if that's really what it looks like and I didn't misunderstand) is really just 3 separate files that happen to be stuffed into columns of one file.
If you use this structure instead, you would have an easier time processing it:
Number | is_clean | is_fdnc |
---|---|---|
1111111111 | 1 | |
2222222222 | 1 | |
3333333333 | 1 | |
1010101010 | ||
9999999999 | 1 | |
8888888888 |
With a file like this, as you read each row you can immediately know which sets it was part of, and easily add an additional column. It also sets you up nicely to be able to load them into a database, which is where these things generally need to end up for use by web apps and whatever else. So, I'd recommend writing out a new file like this if your system isn't bound to the other format.
If you can't change it and really need that 4th column as an independent set, it gets a little awkward because now you need to iterate 4 sets simultaneously. The code would look like
which seems fairly awkward, which is why I recommend changing the file format.my @all_nums= sort keys %all; my @clean_nums= sort keys %clean; my @fdnc_nums= sort keys %fdnc; my @dnc_nums= grep !$clean{$_} && !$fdnc{$_}, @all_nums; use List::Util 'max'; my $n= max($#all_nums, $#clean_nums, $#fdnc_nums, $#dnc_nums); for (my $i= 0; i <= $n; $i++) { $csv->print($temp_output, [ $all_nums[$i], $clean_nums[$i], $fdnc_nums[$i], $dnc_nums[$i] ]); }
|
---|