in reply to Reading (the same) data in different ways & memory usage

The probable reason is that you are storing numeric values as PVs--their string representation as read from the file--in addition to IVs--their numeric representation--the generation of which you are deliberately forcing with this code:

if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|INT|INTEGER|BIGINT|FLOAT|DOUBLE)$/ +) { $hr_returnvalue->{$CurrentColumnName} *= 1; # Multiply by 1 to create a numeric value. }

Having initially loaded the value as a string (PV), when you force it to be converted to a numeric value (IV), the string value will be retained so that if you later decide to use it in a string context, the (inverse) conversion does not have to be repeated.

Eg. After the *= 1;, the PV is still there, but you gained an IV. Essentially you've increased the size of the SV rather than reduce it (as I assume you intended):

C:\test>perl -MDevel::Peek -E"my $x = '12345'; Dump $x; $x *=1; Dump $ +x" SV = PV(0x6cf50) at 0xc74e8 REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0x67758 "12345"\0 CUR = 5 LEN = 8 SV = PVIV(0xaf018) at 0xc74e8 REFCNT = 1 FLAGS = (PADMY,IOK,pIOK) IV = 12345 PV = 0x67758 "12345"\0 CUR = 5 LEN = 8

A fix would be to perform the string->numeric conversion before storing the value:

for( 0 .. $#{$self->columns} ) { $CurrentColumnName = $self->columns->[$_]; if( !(defined $self->flatfield_start->[$_] and defined $self->flatfield_length->[$_] ) ) { # Field is missing interface_start, interface_length or both, skip + it. next; } if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|INT|INTEGER|BIGINT|FLOAT|DOUB +LE)$/ ) { $hr_returnvalue->{$CurrentColumnName} = 0 + substr( $textinput, $self->flatfield_start->[$_], $self->flatfield_length-> +[$_] ); }else { $hr_returnvalue->{$CurrentColumnName} = substr( $textinput, $self->flatfield_start->[$_], $self->flatfield_length +->[$_] ); $hr_returnvalue->{ $CurrentColumnName } =~ s/^\s*(.*?)\s*$/$1/; # Trim whitespace } # Fill empty fields with that field's default value, if such a value +is defined

That should reduce the size of the final data structure significantly if there are many numeric fields.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: Reading (the same) data in different ways & memory usage
by Neighbour (Friar) on Apr 19, 2011 at 14:52 UTC
    I implemented your idea with a slight variation (moved the decimal-correction in the numeric data branch and put the check for empty fields in the non-numeric branch:
    if ($self->datatype->[$_] =~ /^(?:TINYINT|MEDIUMINT|SMALLINT|I +NT|INTEGER|BIGINT|FLOAT|DOUBLE)$/) { $hr_returnvalue->{$CurrentColumnName} = 0 + substr ($texti +nput, $self->flatfield_start->[$_], $self->flatfield_length->[$_]);# +create a numeric value. # Decimal-correction if ($self->decimals->[$_] > 0 and defined $hr_returnvalue- +>{$CurrentColumnName}) { $hr_returnvalue->{$CurrentColumnName} /= 10**$self->de +cimals->[$_]; } } else { $hr_returnvalue->{$CurrentColumnName} = substr ($textinput +, $self->flatfield_start->[$_], $self->flatfield_length->[$_]); $hr_returnvalue->{$CurrentColumnName} =~ s/^\s*(.*?)\s*$/$ +1/; # Trim whitespace # Fill empty fields with that field's default value, if su +ch a value is defined if ($hr_returnvalue->{$CurrentColumnName} eq "") { if (defined $self->standadefaultard->[$_]) { if ($self->datatype->[$_] =~ /^(?:CHAR|VARCHAR|DAT +E|TIME|DATETIME)$/) { $hr_returnvalue->{$CurrentColumnName} = sprint +f ("%s", $self->default->[$_]); } else { $hr_returnvalue->{$CurrentColumnName} = $self- +>default->[$_]; } } else { # Remove empty field delete $hr_returnvalue->{$CurrentColumnName}; } } }
    but the idea is sound. Devel::Size now reports the returned data-structure to be 385251506 bytes, which, for some reason is smaller than the data-structure retrieved from the db...I'll have to look at things more closely to figure out why that is.
      the returned data-structure to be 385251506 bytes, which, for some reason is smaller than the data-structure retrieved from the db

      Perhaps the DBI code doesn't trim leading/trailing spaces on string fields?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        If the target fields in the database are of type CHAR (not VARCHAR), you can make DBI do so by:

        $dbh->{ChopBlanks} = 1;

        Some DBD's do extend this behavior to VARCHAR fields (when the database it too stupid to do so itself, as the ANSI standard tells it to) or when VARCHAR effectively is a CHAR internally (because the database doesn't support VARCHAR).

        YMMV


        Enjoy, Have FUN! H.Merijn
        Well, one thing I found is that all FLOAT (and probably DOUBLE) values are returned as string (for example "0.00" for a column defined as FLOAT(7,2)).
        Using DBD::MySQL 4.0.18 (should that matter) with default connection options.