alphavax has asked for the wisdom of the Perl Monks concerning the following question:
sub dataextractor($) { my $input = shift; my $output = shift; my (@colname,@db_result); open(OUT,">".$output) or die "Can't open the file $output\n"; open(IN,"<".$input) or die "Can't open the file $input\n"; while (<IN>) { print OUT $_; s/\r?\n$//; @colname = split (',',$_); last; }#while close(IN); my $array_size = scalar(@colname); my $db_query; my $dbh = DBI->connect("DBI:CSV:f_dir=.;csv_eol=\n;") or die "Can' +t connect with the DB: $!\n"; $dbh->{'csv_tables'}->{'csvtable'} = { 'file' => $input}; foreach my $key (keys %site_peakhr) { $db_query = "SELECT ".join(',',@colname)." FROM csvtable where $co +lname[2] like $key and $colname[1] = $site_peakhr{$key}" ; @db_result = $dbh->selectrow_array($db_query); if (scalar(@db_result) < 1) { $db_result[0] = $date; $db_result[1] = $cell_peakhr{$key}; $db_result[2] = $key; foreach my $count (3..$array_size-1) { $db_result[$count] = "-"; }#foreach }#if print OUT join(',',@db_result)."\n"; }#foreach close(OUT); return ($output); }
The problem with this approach is that the CSV file (40MB) is loaded to DBI engine 5,000 times and takes hours to process.
Is there a fastest way to do this? the CSV file has the 5,000 sites with hourly data (0 to 23), this in an example of the CSV file:
DATE,HOUR,SITE,SALES,ITEMS,ITEMS2,ITEMS3,ITEMS4,ITEMS5,ITEMS6
11/02/2011,20,NAW6546,51.0,124.0,82.0,50.0,0.0,0.0,366.0
11/02/2011,21,NAW6546,72.0,300.0,82.0,50.0,0.0,0.0,214.0
11/02/2011,22,NAW6546,5.0,254.0,82.0,50.0,0.0,0.0,985.0
11/02/2011,23,NAW6546,47.0,530.0,82.0,50.0,0.0,0.0,517.0
And the hash %site_peakhr has the hour of more sales on each site NAW6546 => 15 NAW1457 => 22 ...
The other approach I was thinking of is to read the CSV line by line and search for the key value in the hash and if is in it, print it to the output file, if not, go to the next, but don't know if it will be faster.
|
|---|