Hello, Perl Monks - I need to read in a very large amount of data (~100 MB) from a file, process it a bit, and store it in an array of references to hashes. This is what I am doing at the moment (I'm a bit of a Perl newbie, so I doubtlessly have many inefficiencies here; suggestions appreciated):
my @cases=(); my @FILE=<DATA>; close(DATA); my $num_lines=scalar(@FILE); $#cases=$num_lines; #pre-extend array foreach my $line (@FILE) { if (($dot % 1000) == 0) { print STDERR "."; } $line=~/^(\S*) [0-9.]* (.*)$/o; my ($class, $feature_vector) = ($1, $2); my %case; $case{'class'}=$class; foreach my $feature (split /\s+/, $feature_vector) { $case{'fv'}{$feature}=1; } push @cases, \%case; $dot++; }
This is very fast for the first ~20,000 lines (out of a total of ~300,000), then suddenly slows down dramatically. Lack of memory is not the problem - at the point it slows down I still have upwards of 700 MB free. At first I thought the processing of each line into the case hash with its attendent splits and regular expressions was the problem, but if I alter the above code to:
my @FILE=<DATA>; close(DATA); my $fred; foreach my $line (@FILE) { if (($dot % 1000) == 0) { print STDERR "."; } $line=~/^(\S*) [0-9.]* (.*)$/o; my ($class, $feature_vector) = ($1, $2); my %case; $case{'class'}=$class; foreach my $feature (split /\s+/, $feature_vector) { $case{'fv'}{$feature}=1; } $fred= \%case; $dot++; }
then the entire file is processed on the order of 100 times more quickly. I've tried using something like $cases[$dot]=\%case or even making cases a hash indexed by case number, but both approaches exhibit a similar slow-down. Any ideas on why this slow-down occurs? (Perl version 5.6.1 being run under a Windows XP system with 1 GB RAM) Thanks, Ryan Gabbard

In reply to Slowness when inserting into pre-extended array by ryangabbard

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.