in reply to Re: Writing multiple Excel::Writer::XLSX worksheets in parallel (3rd and final attempts)
in thread Writing multiple Excel::Writer::XLSX worksheets in parallel (3rd and final attempts)

Results are in for 60 million cells. Each worker writes 1 million rows x 20 columns wide. Total 60 million cells.

# Change line 44 inside init_wb $wb->sheets($_)->set_column(0, 4, 15, $center); # To this $wb->sheets($_)->set_column(0, 19, 15, $center);
# Replace the if/elsif block inside mce_loop with the following # Fill worksheet rows/cells if ($region eq 'AMS') { $wb = init_wb(0); $ws = $wb->sheets(0); $ws->write(0, 2, 'foo', $format); for my $row (1..1e6) { for my $cell (0..19) { $ws->write($row, $cell, $row + $cell + 10e6); }} print "AMS ---- DONE.\n"; } elsif ($region eq 'APJ') { $wb = init_wb(1); $ws = $wb->sheets(1); $ws->write(0, 2, 'bar', $format); for my $row (1..1e6) { for my $cell (0..19) { $ws->write($row, $cell, $row + $cell + 20e6); }} print "APJ ---- DONE.\n"; } elsif ($region eq 'EMEA') { $wb = init_wb(2); $ws = $wb->sheets(2); $ws->write(0, 2, 'baz', $format); for my $row (1..1e6) { for my $cell (0..19) { $ws->write($row, $cell, $row + $cell + 30e6); }} print "EMEA ---- DONE.\n"; }

It takes 4 and 1/2 minutes to write 60 million cells. Opening in LibreOffice takes 4 minutes and 22 seconds before the data appears. LibreOffice consumes 1 GB of memory at that point.

$ time perl write_60mil.pl AMS ---- DONE. EMEA ---- DONE. APJ ---- DONE. Node List is Done. real 4m35.239s user 12m 3.641s sys 0m 4.242s

I watched the system while running. The /dev/shm requires minimum 5 GB (consumed 4.8 GB). The workers themselves each consume 21 MB of memory. The final xlsx file is 200 MB. In other words, this will run on a Linux system or VM having 6 GB of memory. I resized /dev/shm as follow before running.

# update /etc/fstab tmpfs /dev/shm tmpfs defaults,size=5g 0 0 # afterwards, remount /dev/shm mount -o remount /dev/shm

This has been rather interesting and enjoyed trying. The write_60mil.pl script breaks 200k cells per second (218k).

Kind regards, Mario

  • Comment on Re^2: Writing multiple Excel::Writer::XLSX worksheets in parallel (3rd and final attempts)
  • Select or Download Code