I'm currently converting CSV files, which are luckily formatted in such a matter that i can take them apart with split(). Currently I'm running the code like this in a CGI::App:
use autodie;
$c->do_update();
sub do_update {
my ($c) = @_;
for my $file ( @files ) {
$c->process_dump_file( $file );
}
}
sub process_dump_file {
my ($c, $file) = @_;
my @orders;
open my $csv, "<", $file;
push @orders, [ split ' , ', $_ ] while ( <$csv> );
close $csv;
shift @orders;
return;
}
The empty return is intentional for now.
The problem is this: The very first run is extremely fast. Takes maybe 10 seconds. After that each file, even when it is of the same size, takes 1-2 minutes. (Each one takes roughly similar time though, so it's not exponential.)
Each file has about 300_000 - 400_000 lines, with 14 fields on each line and each file being about 50-80 MB. The application gets to around 300 MB ram usage, but that still leaves plenty of free ram, the hdd is not very active during processing and the entire load seems to be cpu activity.
For a run of four files, Benchmark gives this result:
Extracted 308272 orders.
CSV time: 4 wallclock secs ( 4.05 usr + 0.28 sys = 4.33 CPU)
Extracted 301468 orders.
CSV time: 127 wallclock secs (123.47 usr + 0.44 sys = 123.91 CPU)
Extracted 316912 orders.
CSV time: 136 wallclock secs (131.77 usr + 0.42 sys = 132.19 CPU)
Extracted 426854 orders.
CSV time: 145 wallclock secs (139.91 usr + 0.66 sys = 140.56 CPU)
Duration: 432 wallclock secs (412.98 usr + 3.31 sys = 416.30 CPU)
I'm looking for any sort of idea as to how this could be and what i could do against it.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.