Muoyo has asked for the wisdom of the Perl Monks concerning the following question:

I am utilizing perl to take the output of a PDE solver...(output is bascially numerical arrays) and write it to an executable Matlab script for visualization.

I've written my logging routine as a separate module, but currently I'm opening and closing the log file after every iteration of my numeric scheme, so I have a couple questions.


1) How costly computationally are file I/O operations?
2) Any clever ideas on how to implement an intelligent modular logging scheme which will avoid multiple file open close operations, AND avoid storing the whole of the output in a multi-d array?

Any comments or ideas would be greatly appreciated. thanks

-muoyo

Replies are listed 'Best First'.
Re: Intelligent logging
by dragonchild (Archbishop) on Oct 29, 2004 at 01:28 UTC
    File I/O is (almost) completely dependent on the speed of your hard-drives. A 7200rpm disk is going to be half as fast as a 15000rpm disk. Now, depending on the setup of your disk(s), that can be altered. For example, if you use certain RAID operations, that can significantly speed up your reads, but slow down your writes. Additionally, depending on which filesystem you use, that can have an impact. For example, ReiserFS is much better at many small files than ext2 or ext3. Journalling filesystems are generally slow with writes than non-journalling filesystems, for example ext2 is faster than ext3. (If you need journalling, you're not going to comlpain.)

    Overall, however, opening and closing files is generally O(1), which means they generally are rather fast. Now, reading and writing data can be more costly. But, the cost is usually in the memory structures people read into than in the actual cost of reading. Writing, especially appending as in a logfile is almost O(0); in other words, it's so negligible as to be irrelevant.

    Now, my question is why are you closing the file after every iteration? It sounds like you really only want to flush the buffers after every iteration. So, you could do something like:

    use IO::File; my $fh = IO::File->new( ">>$filename" ) or die "Cannot open '$filename +' for appending: $!\n"; # Do your stuff here. At the end of every iteration, call $fh->flush; $fh->close;

    Now, an alternative would be to turn autoflushing on, using $|++; at the top of your script, preferably in a BEGIN block. There are many developers, including merlyn and Abigail-II that do that reflexively. I didn't suggest that at first for a few reasons:

    1. That turns autoflushing on for EVERY filehandle. You might not want that.
    2. Buffering is the default for a reason - the act of writing generally costs as much for 30bytes as for 30K. So, you might as well write as few times as possible.
    3. This turns off buffering for reads as well. You might not want that.
    4. It sounds like you want to control when you flush your buffers. Autoflushing takes that control away from you.

    You just have to figure out what's best for your specific need.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      I agree that disabling buffering reflexively is silly, but I disagree with points 1 and 3. According to perlvar:

      If set to nonzero, forces a flush right away and after every write or print on the currently selected output channel.

      ... and ...

      This has no effect on input buffering.
      generally O(1), which means they generally are rather fast
      O(1) != fast. Roughly speaking big-o notation says how well a given operation scales based on the size of the input. So, it speaks to the realative speed of say an input of size 1000 vs input of size 10000. That having been said, I'd say that opening a file is not O(1), but probably O(n); opening n files takes n times as long as opening one file. Granted, each one is really fast, but it takes a finite amount of time. There can be operations that are O(1) that are slow. But such operations will be equally slow regardless of whether they're operating on one million. I'll end my pedantary now.

      thor

      Feel the white light, the light within
      Be your own disciple, fan the sparks of will
      For all of us waiting, your kingdom will come

Re: Intelligent logging
by BrowserUk (Patriarch) on Oct 29, 2004 at 02:57 UTC

    The questions are "Why are you opening and closing after every file?" and "What do you mean by 'intelligent' logging?". Or more generally, "What problem are you trying to solve?".

    Turning off buffering (or turning on auto-flush which amounts to the same thing), for a particular filehandle, can be done easily using select:

    select $fh; $| = 1; select STDOUT;

    but turning autoflush on/buffering off will slow your IO down in most cases. You'd be better off increasing the the buffering if you are writing large volumes to the same file.

    If IO is really slowing your processing substantially--and you need to verify this is the case otherwise your wasting your time--after you avoided constantly reopening the file which if it's the same file is totally unnecessary, then there are some things that you can do to reduce the perl internal overhead which might make a little difference.

    If that is still not sufficient for your needs, then you can implement a form of asynchronous IO using a thread and a memory-file filehandle to provide additional buffering, but for that to be effective, you need to be generating very large volumes of data at very high speeds.

    If your running on Win32, you could use real asynchronous IO by dropping to the system api level.

    But for any of these things to be worth the effort, you have to know that IO is your bottle neck.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
      Thanks for your reply.

      I want to create a stand-alone logging module which I can use with multiple, possibly computationally intensive numeric schemes.

      While I could open my filehandles in each of the main scripts, and then simply call the logging routine with the appropriate filehandle I was hoping to isolate the logging and the files completely within my module.

      Is it possible to declare multiple file handles globally within an external module and use them on multiple function calls to that module?

      -muoyo
        Is it possible to declare multiple file handles globally within an external module and use them on multiple function calls to that module?

        Yes. There are several ways to do this. Or rather, there are several ways to hold open filehandles inside a module such that you don't have to open them in the calling scripts and pass them in, which is what I think that your really asking. Correct me if I am wrong.

        The essence of all the different ways is to not use globals, or at least not named globals.

        You can use a scalar, including a lexical (my'd one) in place of a GLOB as a filehandle.

        open my $fh, '>', '/logs/mylogs' or die $!; print $fh 'Stuff to end upin my log';

        See perlopentut for more information.

        There are other methods, like creating an anonymous glob for each caller or using one of the standard modules like IO::File or IO::Handle.

        I have a feeling that your question may be more to do with how to differenciate between multiple simultaneous callers, but that's a guess and we would need a clearer explanation of what problem, conceptual or actual, that you are having, before we would know how to properly respond.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon