runrig has asked for the wisdom of the Perl Monks concerning the following question:
foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/g) { ...
$member_sheet is an Archive::Zip object for the worksheet xml file and contents() returns the entire contents of the file. Then the entire file is parsed into an array of tags and text that the foreach loop processes. In trying to save memory, I first was just trying to see if I could process the contents with a while loop like so:
my $buffer = $member_sheet -> contents(); #while ($buffer =~ /\G(<[^<]*>)/scg or $buffer =~ /\G([^<]*(?=<))/scg) + { while ($buffer =~ /(<[^>]+>|[^<]+(?=<))/sg) { $_ = $1; ...
While this seems to work, it's about 100 times slower. I don't know for sure why it's 100 times slower, but I've tried to make a benchmark that shows it should only be about 50% slower:
use Benchmark qw(cmpthese); open(my $fh, "<", 'sheet3.xml') or die "Err: $!"; my $str = do { local $/; <$fh>}; cmpthese(-10, { FOR => sub { pos($str) = 0; for ( $str =~ /(\<.*?\/?\>|.*?(?=\<))/g ) { #print "$_\n"; } }, WHILE_1 => sub { pos($str) = 0; while ( $str =~ /\G(<.*?>|.*?(?=<))/scg ) { $_ = $1; #print "$_\n"; } }, WHILE_2 => sub { pos($str) = 0; while ( $str =~ /\G(<.*?>)/scg or $str =~ /\G(.*?(?=<))/scg ) { $_ = $1; #print "$_\n"; } }, WHILE_3 => sub { pos($str) = 0; while ( $str =~ /\G(<[^>]*>)/scg or $str =~ /\G([^<]*(?=<))/scg ) +{ $_ = $1; #print "$_\n"; } }, }); # Results: s/iter FOR WHILE_3 WHILE_2 WHILE_1 FOR 1.66 -- -27% -31% -35% WHILE_3 1.21 37% -- -6% -11% WHILE_2 1.14 46% 6% -- -5% WHILE_1 1.08 53% 12% 5% --
Is there something wrong with my benchmark (or just something wrong with trying to benchmark this)? Something else going on in Spreadsheet::XLSX? Anyone with enough tuits to look at or comment on this?
TIA for any insights
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Spreadsheet::XLSX memory and speed
by jmcnamara (Monsignor) on Jun 08, 2012 at 22:49 UTC |