Having finally found a working Devel::-module from CPAN (Devel::Size) I'm trying to figure out why my memory usage goes through the roof and into the swapfile when reading large chunks of data.

The data concerned can vary a lot, but in this testcase it's a recordset with 119 fields per record and 47039 records in the set.

Performing a simple my $ar_data = $db->selectall_arrayref("SELECT * FROM testtable", { Slice => {} }); yields a recordset that, according to Devel::Size::total_size is 449337164 bytes. This is about 9553 bytes/record. I can live with that.

However, when writing the same data to a fixed-length file, and subsequently reading it in a new variable, the size turns out to be 773605490 bytes, 16446 bytes/record. The code used to read the data:

# ReadData ($filename) returns ar_data sub ReadData ($$) { my ($self, $filename) = @_; my $ar_returnvalue = []; if (!-e "$filename") { Carp::carp("File [$filename] does not exist"); return undef; } open (FLATFILE, '<', $filename) or Carp::croak("Cannot open file [ +$filename]"); while (<FLATFILE>) { chomp; push (@{$ar_returnvalue}, Interfaces::FlatFile::ReadRecord($se +lf, $_)); } close (FLATFILE); return $ar_returnvalue; } ## end sub ReadData ($$) sub ReadRecord ($$) { my ($self, $textinput) = @_; my $hr_returnvalue = {}; my $CurrentColumnName; for (0 .. $#{$self->columns}) { $CurrentColumnName = $self->columns->[$_]; if (!(defined $self->flatfield_start->[$_] and defined $self-> +flatfield_length->[$_])) { # Field is missing interface_start, interface_length or bo +th, skip it. next; } $hr_returnvalue->{$CurrentColumnName} = substr ($textinput, $s +elf->flatfield_start->[$_], $self->flatfield_length->[$_]); $hr_returnvalue->{$CurrentColumnName} =~ s/^\s*(.*?)\s*$/$1/; + # Trim whitespace # Fill empty fields with that field's default value, if such a + value is defined. if ($hr_returnvalue->{$CurrentColumnName} eq "") { if (defined $self->standaard->[$_]) { if ($self->datatype->[$_] =~ /^(?:CHAR|VARCHAR|DATE|TI +ME|DATETIME)$/) { $hr_returnvalue->{$CurrentColumnName} = sprintf (" +%s", $self->standaard->[$_]); } else { $hr_returnvalue->{$CurrentColumnName} = $self->sta +ndaard->[$_]; } } else { # Remove empty field delete $hr_returnvalue->{$CurrentColumnName}; } } if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|I +NT|INTEGER|BIGINT|FLOAT|DOUBLE)$/) { $hr_returnvalue->{$CurrentColumnName} *= 1;# Multiply by 1 + to create a numeric value. } # Decimal-correction if ($self->decimals->[$_] > 0 and defined $hr_returnvalue->{$C +urrentColumnName}) { $hr_returnvalue->{$CurrentColumnName} /= 10**$self->decima +ls->[$_]; } } ## end for (0 .. $#{$self->columns... return $hr_returnvalue; } ## end sub ReadRecord ($$)
The above code is from a custom-made Interfaces-object, with an Interfaces::FlatFile role (yes, Moose) that provides fixed-length file interfacing. The object contains the following attributes (only the ones used here are shown):
has 'columns' => (is => 'rw', isa => 'ArrayRef[Str]', + lazy_build => 1,); has 'datatype' => (is => 'rw', isa => 'ArrayRef[Str]', + lazy_build => 1,); has 'decimals' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', + lazy_build => 1,); has 'default' => (is => 'rw', isa => 'ArrayRef[Maybe[Value]]', + lazy_build => 1,); has 'flatfield_start' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', +lazy_build => 1,); has 'flatfield_length' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', +lazy_build => 1,);
These attributes are filled by index, so all the above attributes with index n refer to the same field n.

The question is thus: Why does reading from a fixed-length file need much more memory, and what can I do to fix that? :)


In reply to Reading (the same) data in different ways & memory usage by Neighbour

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.