Re^2: Out of Memory

Ok - I am looking at XLSX to see if I can make changes to remove stuff I dont need - since all i am trying to do in the production code is read the cell value and write it out to csv file

Can I eliminate all that excel style processing?

        my $member_styles = $self -> {zip} -> memberNamed ('xl/styles.
+xml');

        my @styles = ();

    my %style_info = ();

        if ($member_styles) {

                foreach my $t ($member_styles -> contents =~ /xf\ numF
+mtId="([^"]*)"(?!.*\/cellStyleXfs)/gsm) { #"
                       # $t = $converter -> convert ($t) if $converter
+;
                        push @styles, $t;

                }
        my $default = $1 || '';

        foreach my $t1 (@styles){
            $member_styles -> contents =~ /numFmtId="$t1" formatCode="
+([^"]*)/;
            my $formatCode = $1 || '';
            if ($formatCode eq $default || not($formatCode)){
                if ($t1 == 9 || $t1==10){ $formatCode="0.00000%";}
                elsif ($t1 == 14){ $formatCode="m-d-yy";}
                else {
                    $formatCode="";
                }
            }
            $style_info{$t1} = $formatCode;
            $default = $1 || '';
        }

        }
[download]

Also, I am considering reducing the cell object size by removing format and type hash keys So change


                }
                my $cell =Spreadsheet::ParseExcel::Cell->new(

                    Val    => $v,
                    Format => $thisstyle,
                    Type => $type
                    
                );
[download]

                }
                my $cell =Spreadsheet::ParseExcel::Cell->new(

                    Val    => $v
                    
                );
[download]

Please let me know what you think

Comment on Re^2: Out of Memory Select or Download Code

Replies are listed 'Best First'.
Re^3: Out of Memory by vkon (Curate) on May 05, 2011 at 19:26 UTC
I think this way. given that all you need is to get XLSX data - probably to get some kind of DB values - then, just do not use this CPAN module, just extract your data yourself, be your-application-centric. But before doing that, drop a letter to CPAN author - it could be that he is responsive and will provide you with a solution soon. Otherwise - just reuse his code of "unzipping" the content, and then use your own regular expression. But better than that - feed you resulting XML string into properly constructed XPATH expression - and feed this XPATH expression to Xml::LibXML - it is very efficient on XPath expressions, but other modules dealing with XPath also will suffice. (I - personally - have good experience with mentioned one, and TIMTOWTDI) This would be best way out of this situation - this is how I feel it. Is 40Mb - a size of ZIPped XLSX, or it is a size after unpacking?	[reply]
Re^4: Out of Memory by ETLTCHFIG (Novice) on May 05, 2011 at 19:45 UTC
40MB is the Zipped XLSX file size - thanks for the guidance vkon! I am trying to use the code in the new method of the XLSX CPAN module and stick my functionality in there so I dont have to store anything in memory	[reply]
Re^4: Out of Memory by ETLTCHFIG (Novice) on May 05, 2011 at 20:31 UTC
OK - I THINK I GOT THE PROBLEM - In XLSX.pm the method new issues the following read foreach ($member_sheet -> contents =~ /(\<.?\/?\>\|.?(?=\<))/g) { This is trying to cache the whole worksheet at a time; one of my worksheets in the XLSX file that is throwing the "Out of Memory" has a million rows Is there a way I can change this to perform line by line reading?	[reply]
Re^5: Out of Memory by vkon (Curate) on May 06, 2011 at 06:17 UTC
optimizing such kind of constructs - is one of the ways in your situation. The answer to your question - 'yes',	[reply]