Welcome to the Monastery!
You're going to find while here that you get very good answers when you ask specific questions, and questions that provide adequate background. The question you've presented doesn't tell us what exactly you're having difficulty with. If you really don't know where to begin, that's ok too, but you should state that, and we can get you started in the right direction. If you are having trouble with one step, that's fine too, but tell us what step.
It's often helpful to either provide a 10-15 line snippet of code you're having difficulty with, or a description of the basic framework of code you're contemplating so that we can offer suggestions. Again, if you have no code, and no framework in mind, it helps if you let us know that so we can start you down the framework road. Then you can dig into the documentation to figure out how to implement that framework. And undoubtedly you'll have questions along the way when you get to that point, at which time you'll be able to ask answerable questions.
With your current description, I would recommend the following framework:
- open each CSV file, and begin iterating over its records.
- open a results file too.
- Use Text::CSV to access individual fields from the record you're working on in the current iteration.
- Perform your calculations on the current record.
- Output the current record to your results file.
- Move onto the next record, repeat steps 3,4,5 and 6 until no more records are left.
- close the two input files and the output file.
- Do whatever else you need to do, and then you're done.
Hopefully that gets you going in the right direction, but feel free to ask more specific questions as you dig into solving each of these steps.
| [reply] |
Dave,
Thank you for your response. I'm new to perl. I guess I need more than a nudge in the right direction. I tried to reuse a code example for comparing files to no avail.
handleArgs();
if($outfile) {
open (STDOUT, ">$outfile") || die "Cannot redirect STDOUT\n";
}
open(TEMP,$newFile) || die "Error opening $newFile\n";
open(FILE,$oldFile) || die "Error opening $oldFile\n";
while(<TEMP>)
{
push(@temp, $_);
}
while(<FILE>)
{
push(@file, $_);
}
%temp = sort(@temp);
%file = sort(@file);
OUTER:
foreach $_ ($file)
{
next if /^\s*$/; #ignore empty lines
next if /^\s*#/; #ignore comments
chomp;
# $_ = lc;
while(%temp)
{
$x = shift %temp;
next if ($x =~ /^\s*$/);
next if ($x =~ /^\s*#/);
chomp $x;
@compX = split /,/, $x;
@compY = split /,/, $_;
$comparison = lc($compY[0..2]) cmp lc($compX[0..2]);
print "$_ - $x = $comparison\n" if $DEBUG;
if($comparison == 0) {
print "Record match...checking pctChange \n";
$fpPctChange = abs(($compX[3] - $compY[3]) / $compX[3] * 1
+00);
if ($fpPctChange > 1) {
print " BUT pct. Change greater than one.for $_
+. Please Investigate.\n";
print "Here's what was Found $_ AND $x\n";
}
next OUTER;
}
else {
print "this record is not a match. $_ , $x\n" if $DEBUG;
next OUTER;
}
}
print " This oldFile record was not found in newFile file! $_\n";
}
if($W)
{
foreach $temp (@temp) { print "Warning $temp not expected\n";}
}
sub handleArgs()
{
while(@ARGV)
{
print "In Handle args\n" if $DEBUG;
#Enable warnings
if($ARGV[0] =~ /-W/i)
{
$W = 1; shift(@ARGV);
}
elsif($ARGV[0] =~ /-t/i)
{
$oldFile = $ARGV[1]; splice(@ARGV, 0, 2);
}
elsif($ARGV[0] =~ /-d/i)
{
$newFile = $ARGV[1]; splice(@ARGV, 0, 2);
}
elsif($ARGV[0] =~ /-O/i)
{
$outfile = $ARGV[1]; splice(@ARGV, 0, 2);
}
else
{
print "Invalid Arg in FileCmp ", shift (@ARGV),"\n";
}
}
}
| [reply] [d/l] |
Well, it's really good that you gave that a try. The only thing that was glaringly bad: don't assign the output of "sort" to a hash (first, third, etc become keys, second, fourth, etc become values), and don't ever use "shift" on a hash. You better study some more about hashes vs. arrays.
But you really need to rethink the algorithm. Since you are creating a union of two sets of records, where some keys might be present in both sets, you want to build the union in a single hash, then when that's done, print the contents of the hash.
Contrary to davido's advice, I would read the old file into the hash first. Use the concatenation of the first three fields as the hash key (i.e. $key = join ",", @fields[0..2]; then use the fourth field as the hash value. (Are there more than four fields per line? If so, the hash value can be an array.)
Then read the new file in the same way: break each record into fields and concatenate the first three to make a hash key; if the hash key already exists, you have to compare field 4 against the existing hash value, and keep or replace the old hash value as appropriate; otherwise, just add the novel key/value set into the hash.
Once you reach the end of the second file, your hash is the complete and correct union, and you just print it.
Based on the code you tried, I'm assuming that you are confident about the distribution of commas in your data -- i.e. that every line of data contains exactly 3 commas (separating the four fields per line). If you really are confident that this is true and will never change, then using split is good enough.
Um, your handling of command line args seemed a bit strange; here's an untested sample of how I would approach the task:
#!/usr/bin/perl
use strict;
my $Usage = "Usage: $0 old_file new_file > union_file\n";
die $Usage unless ( @ARGV == 2 and -f $ARGV[0] and -f $ARGV[1] );
my %union;
open IN, $ARGV[0] or die "$ARGV[0]: $!";
while (<IN>) {
chomp;
my @flds = split /,/;
my $val = pop @flds; # assumes exactly 4 fields in every row
my $key = join ',', @flds;
$union{$key} = $val;
}
open IN, $ARGV[1] or die "$ARGV[1]: $!";
while (<IN>) {
chomp;
my @flds = split /,/;
my $val = pop @flds;
my $key = join ',', @flds;
next if ( exists( $union{$key} ) and
abs(($union{$key} - $val)/$union{$key}) * 100 > 1 );
$union{$key} = $val;
}
# union is now complete
print "$_,$union{$_}\n" for ( sort keys %union );
(You should probably check to see that the sense of the value comparison is what you intended. It's so easy to invert the logic when you don't mean to.)
(updated to move the close paren for the "abs()" call). | [reply] [d/l] [select] |
DBD::CSV will allow you to do all of that with SQL. If SQL isn't your thing, AnyData will let you join and compare CSV files using tied hashes. | [reply] |
You should probably do your own homework. | [reply] |