ndts has asked for the wisdom of the Perl Monks concerning the following question:

Hello PerlMonks,

I have inherited from another person some Perl scripts dealing with logfile manipulation. The idea is simple, read data from some text file, decode it and write the results in an Excel file. The script is working ok, but it takes a huge amount of time to finish, even though the log files are not that big (30MB max). Moreover, the script runs 10x faster on my colleague machine. I use Win32::OLE to deal with Excel manipulation as I also need to do some formatting on the output.

To find out which parts of the code are slow I have done a profiling of my application using Devel::NYTProf and I have found some interesting results. A snippet of the code is the following:

$Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OL +E->new('Excel.Application', 'Quit'); $Book = $Excel->Workbooks->Open( $templateFile ); # #decoding code here # $Book->Worksheets($sheetNumber)->Range("A$Row:E$Row")->{Value} = [[$Ti +meAbs,$TimeRel,$TimeLog,$Info,$SigValue]];

The NYTProf output for the last line of code shows something like:

  "# spent 35.5s making 44688 calls to Win32::OLE::AUTOLOAD, avg 794µs/call"

and a few lines below:

  "# spent 454s making 22344 calls to Win32::OLE::DESTROY, avg 20.3ms/call"

What I want to understand is why the AUTOLOAD function is called here? (Moreover why it is called 2 times every time the line is passed(the line is called 22344 times). Could this be related to the fact that I use Win32::OLE on a configuration on 64bits (both Windows and Excel)? Also, why is the destructor so time expensive and why is it called every time the line is executed? I suppose that some temporary objects are created there but I do not have enough experience with Win32::OLE to figure it out.

Any suggestions, hints, tutorials are welcome. Thanks.

Sebi

Replies are listed 'Best First'.
Re: Win32::OLE Excel temporary objects destruction
by swl (Prior) on Nov 22, 2017 at 22:42 UTC

    Just checking to be sure, but did you run nytprofhtml? The files it generates contain a wealth of information, all hyperlinked for easy perusal.

    If you generated the flame graph then you can also readily see which call stacks are taking all the time. It's one way to see if the AUTOLOAD and DESTROY stacks are due to one code path, or are split across many.

    There should also be a set of links immediately below the "spent" lines you are quoting which will allow you to trace which ones are taking all the time or are being called the most.

Re: Win32::OLE Excel temporary objects destruction
by kikuchiyo (Hermit) on Nov 22, 2017 at 23:29 UTC

    Do you have to update an existing Excel file, or is it enough to generate a new one for every batch of input?

    In the latter case it may be feasible to rewrite your program to use Spreadsheet::WriteExcel (or Excel::Writer::XLSX). These modules build XLS (or XLSX) files natively, so they can work even without MS Excel being installed, and in my experience they are reasonably fast.

Re: Win32::OLE Excel temporary objects destruction
by davies (Monsignor) on Nov 23, 2017 at 11:22 UTC

    I've never felt the need to use Devel::NYTProf and would not claim to understand the technicalities of AUTOLOAD. But the first line of code gets lots of alarm bells ringing with me.

    $Excel = Win32::OLE->GetActiveObject('Excel.Application') || Win32::OL +E->new('Excel.Application', 'Quit');

    Even more bells ring when you say that the last line is called tens of thousands of times. Your control structures don't appear, making it look as though this line will be called the same number of times. If so, I guarantee that the code will be painfully slow. This would explain why there is no my in the line I have quoted and might have something to do with your 2:1 ratio. The line I have quoted says (rough, incomplete translation): "however many instances of Excel the user has running, take control of one of them at random, regardless of what it is doing and when it will become available to me". There are times when this is what is meant. They are rare. More usually, my $Excel = Win32::OLE->new('Excel.Application'); will give you what you want - your own, virgin, predictable instance of Excel, without the risk of treading on someone else's work. This is where your code will (not may, WILL) be inefficient. Does an active instance of Excel exist? If so, that instance will need all sorts of tests run on it to determine when control might be passed to Perl. If it's one of my spreadsheets with code that will run for several hours, try coming back tomorrow. If no instance exists, the search through the active processes for one to use is wasted. You can't win. Since there are two possible routes to an active instance of Excel, this may be where your 2:1 ratio is generated.

    You will notice that I used my in the line I suggested. It will always be more efficient to create a single instance of Excel and hack that yourself than to create a separate instance for each spreadsheet or whatever. There can be good reasons for using multiple instances. I do it all the time, for example when I want manual recalculation on some files and automatic on others. But I've never needed multiple instances in Perl, so the likelihood is that you will be better with a single instance that you open at the start of your code and close at the end.

    I don't know much about the internal workings of Win32::OLE, but in VBA, the docs state that it is faster to specify the target object as directly as possible. So in your line $Book->Worksheets($sheetNumber)->Range("A$Row:E$Row")->{Value} = [[$TimeAbs,$TimeRel,$TimeLog,$Info,$SigValue]];, it MAY (depending on Win32::OLE) be better to create your own array of sheets and then use something like $sht[$sheetNumber]->Range("A$Row:E$Row")->{Value} = ...; instead. And if the row you use is static (which I doubt), you can do even better by assigning the range to your own variable & bypassing the sheet call.

    Finally, if you aren't using strict and warnings, they can be very helpful.

    Regards,

    John Davies

Re: Win32::OLE Excel temporary objects destruction
by locked_user sundialsvc4 (Abbot) on Nov 22, 2017 at 21:14 UTC

    OLE can be extremely expensive.   Avoid anything which has the possibility of creating more objects on the fly.   For instance, in this piece of code:

    $Book->Worksheets($sheetNumber) ...
    ...put that object into a local variable and remember what sheet-number it uses.   Re-create the object only if the sheet-number changes, and arrange things so that the logic does not bounce between sheets.   Similar strategies should be applied at every reasonable opportunity.

    I am pondering why this code makes exactly twice-as-many calls to AUTOLOAD as it does to DESTROY.   This almost implies to me that, somehow, OLE objects might be accumulating.   But you have not posted enough code to indicate if this is a problem, or a bug.

    You neglect to say how much time “a huge amount of time” actually is, nor a hardware comparison (particularly, memory ...) between yours and your colleague’s machine.   Run a process-monitor on both machines as it runs on each one, and try to see what’s slowing it down, from that monitor’s high-level perspective.

    OLE drives the application much like an interactive user session would.   Anything that an interactive user might choose to do, in order to save time, applies to you also.   For instance, when loading data into an Excel sheet, you should suspend re-calculation until all of the data has been loaded.   OLE is fundamentally an application-level API, and everything that you do has the potential for time-consuming side effects which your application will be forced to wait for.   (Per contra, the Perl-side object-twiddling actually takes negligible time by comparison.)   It is these side-effects that will bite you in the asterisk.   Depending on the implementation of the application that you are controlling, seemingly-minor changes can have dramatic effect ... for good, or for ill.

Re: Win32::OLE Excel temporary objects destruction
by locked_user sundialsvc4 (Abbot) on Nov 22, 2017 at 21:18 UTC

    One more thought:   could you write the data to a temporary file, then, by means of OLE, tell the Excel application to import the data in that file, into a specified spreadsheet range?   It seems to me that this would be well worth an experiment . . .

      I would like to reinforce the suggestion from sundialsvc4... you can easy do it in SQLite database and read from this DB from the Excel file, as soon as you open it.

      Easier to do, probably with much better performance and you would also have some added flexibility to select what to read from the DB!

      Alceu Rodrigues de Freitas Junior
      ---------------------------------
      "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill