Final attempt :-)

Writing many millions cells will likely consume more memory than available on the box. Thus, tried one more time with set_optimization. The internal str_table is not used from what I can see when enabling optimization. I removed MCE::Shared and the override from the demonstration. This now works with MCE 1.608.

All one must do is write rows in sequential row order. Excel::Writer::XLSX writes to temporary files behind the scene. One can pass the -use_dev_shm option to MCE::Signal to have the temp directories reside under /dev/shm on Linux which resides in memory. It requires 814 MB for 10 million cells. So, please remove the option if /dev/shm is not big enough.

Performance wise, it takes 5.5 seconds and 53.0 seconds for 1 and 10 million cells respectively all while consuming very little memory.

for ( 1 .. 111_111 ) { ... } # 3 * 3, 1 million for ( 1 .. 1_111_111 ) { ... } # 3 * 3, 10 million

Note: Numbers on the Mac isn't showing cells containing text when optimization is enabled. LibreOffice is working though.

#!/usr/bin/env perl use strict; use warnings; use Archive::Zip (); use File::Copy qw(move); use File::Find (); use File::Temp (); $File::Temp::KEEP_ALL = 1; use Excel::Writer::XLSX; use MCE::Signal qw($tmp_dir -use_dev_shm); use MCE::Loop; my $nodeList = [ [ 'AMS' , 'a' ], [ 'APJ' , 'ap' ], [ 'EMEA', 'e' ] ]; my ($center, $format); sub init_wb { my $wn = shift; # Increment $wn by 1 since worksheet xml files begin at 1 $wn++; mkdir "$tmp_dir/$wn"; my $wb = Excel::Writer::XLSX->new("$tmp_dir/$wn/tmp.xlsx"); $wb->set_tempdir("$tmp_dir/$wn"); $wb->set_optimization(); # Set workbook properties $wb->set_properties( title => 'Node List', author => 'L_WC demo', comments => 'Node List', ); # Define/add formats to the workbook $center = $wb->add_format(align => 'center'); $format = $wb->add_format(align => 'center', bg_color => 44); # Add worksheets, specify formats for columns/rows for (0 .. @{ $nodeList } - 1) { $wb->add_worksheet($nodeList->[$_][0]); $wb->sheets($_)->set_column(0, 4, 15, $center); } return $wb; } sub merge_wb_data { my $wb_file = shift; my ($zip, @pths, @xlsx_files) = (Archive::Zip->new()); local ($@, $!, $^E, $?); # Other files, e.g. table data likely need the same and not done # for this demonstration. Just worksheet files are merged. # I received help by reading _store_workbook inside # Excel::Writer::XLSX::Workbook.pm. # Find worksheet files 2,3,... for my $_num (1 .. @{ $nodeList }) { my $wanted = sub { push @pths, $1 if $File::Find::name =~ /(.*)\/sheet$_num\.xml +/; }; File::Find::find({ wanted => $wanted, untaint => 1, untaint_pattern => qr|^(.+)$ +| }, "$tmp_dir/$_num"); } # Move worksheet files 2,3,... to where worksheet 1 data resides for (0 .. @pths - 1) { unlink $pths[$_]."/../../../tmp.xlsx"; if ($_ > 0) { my $_num = $_ + 1; unlink $pths[0]."/sheet$_num.xml"; move $pths[$_]."/sheet$_num.xml", $pths[0]."/sheet$_num.xml"; } } # Re-zip xlsx files my $wanted = sub { push @xlsx_files, $File::Find::name if -f }; my $temp_dir = $pths[0]."/../../"; my $short_name; File::Find::find({ wanted => $wanted, untaint => 1, untaint_pattern => qr|^(.+)$| }, $temp_dir); for my $file_name (@xlsx_files) { $short_name = $file_name; $short_name =~ s{^\Q$temp_dir\E/?}{}; $zip->addFile($file_name, $short_name); } open my $fh, '>', $wb_file or die "Error opening xlsx file: $!\n"; binmode $fh; $zip->writeToFileHandle($fh); close $fh; } # --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- MCE::Loop::init( max_workers => scalar(@{ $nodeList }), chunk_size => 1, use_threads => 0, ); mce_loop { my ($region, $sql) = ($_->[0], $_->[1]); my ($wb, $ws); # Acquire data from the DB. Each worker must obtain a handle. # The DB logic is similar to running serially. Just the where # clause is likely unique for each region. # Fill worksheet rows/cells if ($region eq 'AMS') { $wb = init_wb(0); $ws = $wb->sheets(0); $ws->write(0, 2, 'foo', $format); for ( 1 .. 111_111 ) { $ws->write($_, 0, 1000 + $_); $ws->write($_, 2, 2000 + $_); $ws->write($_, 4, 3000 + $_); } print "AMS ---- DONE.\n"; } elsif ($region eq 'APJ') { $wb = init_wb(1); $ws = $wb->sheets(1); $ws->write(0, 2, 'bar', $format); for ( 1 .. 111_111 ) { $ws->write($_, 0, 4000 + $_); $ws->write($_, 2, 5000 + $_); $ws->write($_, 4, 6000 + $_); } print "APJ ---- DONE.\n"; } elsif ($region eq 'EMEA') { $wb = init_wb(2); $ws = $wb->sheets(2); $ws->write(0, 2, 'baz', $format); for ( 1 .. 111_111 ) { $ws->write($_, 0, 7000 + $_); $ws->write($_, 2, 8000 + $_); $ws->write($_, 4, 9000 + $_); } print "EMEA ---- DONE.\n"; } $wb->close() if $wb; } $nodeList; # Shutdown MCE MCE::Loop::finish(); # Merge data into one workbook merge_wb_data('Node_List.xlsx'); print "Node List is Done.\n";

Kind regards, Mario


In reply to Re: Writing multiple Excel::Writer::XLSX worksheets in parallel (3rd and final attempts) by marioroy
in thread Writing multiple Excel::Writer::XLSX worksheets in parallel (3rd and final attempts) by marioroy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.