Neighbour has asked for the wisdom of the Perl Monks concerning the following question:
Having finally found a working Devel::-module from CPAN (Devel::Size) I'm trying to figure out why my memory usage goes through the roof and into the swapfile when reading large chunks of data.
The data concerned can vary a lot, but in this testcase it's a recordset with 119 fields per record and 47039 records in the set.
Performing a simple my $ar_data = $db->selectall_arrayref("SELECT * FROM testtable", { Slice => {} }); yields a recordset that, according to Devel::Size::total_size is 449337164 bytes. This is about 9553 bytes/record. I can live with that.
However, when writing the same data to a fixed-length file, and subsequently reading it in a new variable, the size turns out to be 773605490 bytes, 16446 bytes/record. The code used to read the data:
The above code is from a custom-made Interfaces-object, with an Interfaces::FlatFile role (yes, Moose) that provides fixed-length file interfacing. The object contains the following attributes (only the ones used here are shown):# ReadData ($filename) returns ar_data sub ReadData ($$) { my ($self, $filename) = @_; my $ar_returnvalue = []; if (!-e "$filename") { Carp::carp("File [$filename] does not exist"); return undef; } open (FLATFILE, '<', $filename) or Carp::croak("Cannot open file [ +$filename]"); while (<FLATFILE>) { chomp; push (@{$ar_returnvalue}, Interfaces::FlatFile::ReadRecord($se +lf, $_)); } close (FLATFILE); return $ar_returnvalue; } ## end sub ReadData ($$) sub ReadRecord ($$) { my ($self, $textinput) = @_; my $hr_returnvalue = {}; my $CurrentColumnName; for (0 .. $#{$self->columns}) { $CurrentColumnName = $self->columns->[$_]; if (!(defined $self->flatfield_start->[$_] and defined $self-> +flatfield_length->[$_])) { # Field is missing interface_start, interface_length or bo +th, skip it. next; } $hr_returnvalue->{$CurrentColumnName} = substr ($textinput, $s +elf->flatfield_start->[$_], $self->flatfield_length->[$_]); $hr_returnvalue->{$CurrentColumnName} =~ s/^\s*(.*?)\s*$/$1/; + # Trim whitespace # Fill empty fields with that field's default value, if such a + value is defined. if ($hr_returnvalue->{$CurrentColumnName} eq "") { if (defined $self->standaard->[$_]) { if ($self->datatype->[$_] =~ /^(?:CHAR|VARCHAR|DATE|TI +ME|DATETIME)$/) { $hr_returnvalue->{$CurrentColumnName} = sprintf (" +%s", $self->standaard->[$_]); } else { $hr_returnvalue->{$CurrentColumnName} = $self->sta +ndaard->[$_]; } } else { # Remove empty field delete $hr_returnvalue->{$CurrentColumnName}; } } if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|I +NT|INTEGER|BIGINT|FLOAT|DOUBLE)$/) { $hr_returnvalue->{$CurrentColumnName} *= 1;# Multiply by 1 + to create a numeric value. } # Decimal-correction if ($self->decimals->[$_] > 0 and defined $hr_returnvalue->{$C +urrentColumnName}) { $hr_returnvalue->{$CurrentColumnName} /= 10**$self->decima +ls->[$_]; } } ## end for (0 .. $#{$self->columns... return $hr_returnvalue; } ## end sub ReadRecord ($$)
These attributes are filled by index, so all the above attributes with index n refer to the same field n.has 'columns' => (is => 'rw', isa => 'ArrayRef[Str]', + lazy_build => 1,); has 'datatype' => (is => 'rw', isa => 'ArrayRef[Str]', + lazy_build => 1,); has 'decimals' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', + lazy_build => 1,); has 'default' => (is => 'rw', isa => 'ArrayRef[Maybe[Value]]', + lazy_build => 1,); has 'flatfield_start' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', +lazy_build => 1,); has 'flatfield_length' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', +lazy_build => 1,);
The question is thus: Why does reading from a fixed-length file need much more memory, and what can I do to fix that? :)
|
|---|