comment on

Having finally found a working Devel::-module from CPAN (Devel::Size) I'm trying to figure out why my memory usage goes through the roof and into the swapfile when reading large chunks of data.

The data concerned can vary a lot, but in this testcase it's a recordset with 119 fields per record and 47039 records in the set.

Performing a simple my $ar_data = $db->selectall_arrayref("SELECT * FROM testtable", { Slice => {} }); yields a recordset that, according to Devel::Size::total_size is 449337164 bytes. This is about 9553 bytes/record. I can live with that.

However, when writing the same data to a fixed-length file, and subsequently reading it in a new variable, the size turns out to be 773605490 bytes, 16446 bytes/record. The code used to read the data:

# ReadData ($filename) returns ar_data
sub ReadData ($$) {
    my ($self, $filename) = @_;
    my $ar_returnvalue = [];
    if (!-e "$filename") {
        Carp::carp("File [$filename] does not exist");
        return undef;
    }
    open (FLATFILE, '<', $filename) or Carp::croak("Cannot open file [
+$filename]");
    while (<FLATFILE>) {
        chomp;
        push (@{$ar_returnvalue}, Interfaces::FlatFile::ReadRecord($se
+lf, $_));
    }
    close (FLATFILE);
    return $ar_returnvalue;
} ## end sub ReadData ($$)

sub ReadRecord ($$) {
    my ($self, $textinput) = @_;
    my $hr_returnvalue = {};
    my $CurrentColumnName;
    for (0 .. $#{$self->columns}) {
        $CurrentColumnName = $self->columns->[$_];
        if (!(defined $self->flatfield_start->[$_] and defined $self->
+flatfield_length->[$_])) {
            # Field is missing interface_start, interface_length or bo
+th, skip it.
            next;
        }
        $hr_returnvalue->{$CurrentColumnName} = substr ($textinput, $s
+elf->flatfield_start->[$_], $self->flatfield_length->[$_]);
        $hr_returnvalue->{$CurrentColumnName} =~ s/^\s*(.*?)\s*$/$1/; 
+   # Trim whitespace
        # Fill empty fields with that field's default value, if such a
+ value is defined.
        if ($hr_returnvalue->{$CurrentColumnName} eq "") { 
            if (defined $self->standaard->[$_]) {
                if ($self->datatype->[$_] =~ /^(?:CHAR|VARCHAR|DATE|TI
+ME|DATETIME)$/) {
                    $hr_returnvalue->{$CurrentColumnName} = sprintf ("
+%s", $self->standaard->[$_]);
                } else {
                    $hr_returnvalue->{$CurrentColumnName} = $self->sta
+ndaard->[$_];
                }
            } else {
                # Remove empty field
                delete $hr_returnvalue->{$CurrentColumnName};
            }
        } 
        if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|I
+NT|INTEGER|BIGINT|FLOAT|DOUBLE)$/) {
            $hr_returnvalue->{$CurrentColumnName} *= 1;# Multiply by 1
+ to create a numeric value.
        }
        # Decimal-correction
        if ($self->decimals->[$_] > 0 and defined $hr_returnvalue->{$C
+urrentColumnName}) {
            $hr_returnvalue->{$CurrentColumnName} /= 10**$self->decima
+ls->[$_];
        }
    } ## end for (0 .. $#{$self->columns...
    return $hr_returnvalue;
} ## end sub ReadRecord ($$)
[download]

The above code is from a custom-made Interfaces-object, with an Interfaces::FlatFile role (yes, Moose) that provides fixed-length file interfacing. The object contains the following attributes (only the ones used here are shown):

has 'columns'     => (is => 'rw', isa => 'ArrayRef[Str]',             
+     lazy_build => 1,);
has 'datatype'    => (is => 'rw', isa => 'ArrayRef[Str]',             
+     lazy_build => 1,);
has 'decimals'    => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]',      
+     lazy_build => 1,);
has 'default'   => (is => 'rw', isa => 'ArrayRef[Maybe[Value]]',      
+   lazy_build => 1,);
has 'flatfield_start'  => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', 
+lazy_build => 1,);
has 'flatfield_length' => (is => 'rw', isa => 'ArrayRef[Maybe[Int]]', 
+lazy_build => 1,);
[download]

These attributes are filled by index, so all the above attributes with index n refer to the same field n.

The question is thus: Why does reading from a fixed-length file need much more memory, and what can I do to fix that? :)

In reply to Reading (the same) data in different ways & memory usage by Neighbour

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.