Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have the following function calls:

main_func is opening a file and writing one line to it. Then main_func calls sub_func1 and after that sub_func2, which are in other modules. Afterwards main_func writes one more line into the file and then closes the file.

You see at the moment I open the file once in main_func and close it also in main_func. The function sub_func1 and sub_func2 do not open the file. They just get the file_handle of the file via a parameter and write to the file. Let's call this solution 1.

The other solution, let's call it solution 2, would be to open the file for writing in main_func, then write one line, close it. Then call sub_func1, open the file for appending, write some stuff into it, close it. Then sub_func2 is called. Opens the file for appending, writing some stuff, closing the file. Then main func opens the file for appending, writes one line into it, closes it.

What is the better solution? 1 or 2?

Now I have to add something to the current behaviour. After the call to sub_func1 I have to modify the file by moving one line to another position in the file. And then sub_func2 is called and everything goes on as before. I think that this kills solution 1. What would you suggest?

I'm just interested what is your meaning to this.

Thank you

Dirk

  • Comment on Design/Style question about writing to a file from different functions

Replies are listed 'Best First'.
Re: Design/Style question about writing to a file from different functions
by ikegami (Patriarch) on Jul 20, 2010 at 16:25 UTC

    Given the information I have, I'd go with (1).

    After the call to sub_func1 I have to modify the file by moving one line to another position in the file.

    Nothing's stopping you from closing the file before you do this and reopening it afterwards.

    But moving a line in a file you've just built? That sounds very fishy.

Re: Design/Style question about writing to a file from different functions
by jethro (Monsignor) on Jul 20, 2010 at 16:30 UTC

    In your first scenario solution 1 looks much better for the simple reason that you don't duplicate code (like opening the file 4 times). Duplicate code means more room for bugs if you ever have to change something

    In your second scenario I would desperately look for a different solution. Changing a line in an already written file is always a hassle. If the contents of the file fits into memory, why not wait with writing to disk and juggle lines in an @array instead. Reordering lines in an array with splice is absolutely trivial

Re: Design/Style question about writing to a file from different functions
by roboticus (Chancellor) on Jul 20, 2010 at 16:49 UTC

    Dirk80:

    I prefer solution 1. Since it's a simple output stream it would be fine. I think the simplicity of simple writes and passing around a file handle wins.

    When you get to the further complication, however, I think you should consider wrapping the file I/O into a simple API that lets you concentrate on the business at hand. For that, you'd want a function to write the data, and one to perform the data move. Once you go to the trouble of having several operations you want to perform on your file, then it's worth the little extra work to wrap it up.

    For example, you might wrap it up as simply as something like:

    { my $FName='default.log'; my $fh; sub _open { open $fh, '+>', $FName or die "Can't open $FName: $!"; } sub write { my $data = shift; _open() if !defined $fh; syswrite($fh, $data) or die "write() error: $FName: $!"; } sub move { my ($src_pos, $dest_pos, $bytes) = @_; sysseek($fh, $src_pos, SEEK_SET); my $data; sysread($fh,$data,$bytes); sysseek($fh, $dest_pos, SEEK_SET); syswrite($fh, $data); sysseek($fh, 0, SEEK_END); } # other stuff as needed (setting name,...) }

    Of course, that's untested, and you should take the extra step to turn it into a class...

    Well, that's my opinion. It's worth about what you paid for it... ;^)

    ...roboticus

Re: Design/Style question about writing to a file from different functions
by afoken (Chancellor) on Jul 20, 2010 at 16:29 UTC

    Define "Better".

    Solution one has an advantage when you use file locking, because the file can be locked until you are done writing it.

    Solution two has no real advantages: You still need to pass a parameter specifying the file name. Use a file handle instead and you have solution one.

    Solution two is not needed, and the new behaviour does not break solution one: Open the file in read/write mode and you can modify it while still holding a lock.

    Also, rethink your concept: Why do you think you need to modify the file on disk? Modify the way you write the data into the file so it is written in correct order. Think about reducing the complexity of the routines writing the file. Split them into smaller parts that can be called in any order needed. If you have the memory, think about creating an array of lines (or any other useful data structure, like a tree) first, process it as needed, and finally write the array in one run into the file.

    Show us the relevant parts of the code, perhaps you are making your own life harder than really needed.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Design/Style question about writing to a file from different functions
by ww (Archbishop) on Jul 20, 2010 at 18:43 UTC
    "... main_func calls sub_func1 and after that sub_func2, which are in other modules"

    Why?

    If you consider the parameters of "better" to include 'how many times' the solution opens and closes your data file and 'how many modules' you have to load, I submit that refactoring all the scripts using main_funct, sub_func1, sub_func2 and whatever the "add something" may be, so you end up with a single module... at which point your other alternatives become moot.

      Thank you very much for your answers.Now I'm sure that solution 1 is the better solution.

      As the author of a perl script my goal is it that it runs on a computer with much and less memory. So I do not know how much memory I have. But to give you numbers. sub_func1 writes about 50MB to the output file and sub_func2 writes about 150MB to the output file.

      Do you think that holding about 200MB in memory is ok and then writing everything to the output file in one swoop? Or the other alternative would be to write first 50MB to the file and then 150MB. How many MB do you usually collect in memory if you want to use it on different computers?

      Now to the reason why I want to move a line directly after building a part of the file. I'm computing a checksum which I know when sub_func1 finished. But the line with the checksum has to be at the beginning of the file and NOT at the end. My problem was that I thought that it is better to write everything directly to the file and that it is not ok to collect everything in memory. That's why I had the problem to move a line afterwards. If the memory solution is ok, then everything gets easier. Only main_func will then write to the file and the helper functions collect everything in memory.

        I'm computing a checksum which I know when sub_func1 finished. But the line with the checksum has to be at the beginning of the file and NOT at the end.

        Most checksums are (or can be) a fixed length string.

        So, why not write a dummy checksum at the beginning. Write the rest of the file as you generate it. When you've finished writing, rewind seek the file to the beginning and overwrite the checksum. Close and done.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I would probably use temp files then append the rest to the file with the checksum once everything is done. That way it doesn't have to be stored in the memory while processing and let the OS do it which would most likely be the most efficient. But without specifics it is hard to tell, just poking in the dark here.
Re: Design/Style question about writing to a file from different functions
by wfsp (Abbot) on Jul 20, 2010 at 16:23 UTC
    I think that this kills solution 1.
    Why do you think that?

    Could you show us a simple code example of what you have in mind?

Re: Design/Style question about writing to a file from different functions
by JavaFan (Canon) on Jul 20, 2010 at 17:42 UTC
    Now I have to add something to the current behaviour. After the call to sub_func1 I have to modify the file by moving one line to another position in the file. And then sub_func2 is called and everything goes on as before. I think that this kills solution 1.
    Hmmm, so what magical properties does opening a file have that allows you to move a single line to another position, that you don't seem to be able to do with an open file handle?
Re: Design/Style question about writing to a file from different functions
by pemungkah (Priest) on Jul 20, 2010 at 21:42 UTC
    I agree with the other posters who are suggesting more abstraction. If you disentangle the operations you need to do to perform the file manipulations from the actual things to be accomplished, I think you'll find that it makes the program as a whole decidedly simpler, and wrapping it up in an object seems like a foregone conclusion. Something like this:
    sub func1 { my $file_manager = File::Manipulator->new($filename); $file_manager->write_line(...); # maybe other writes ... $file_manager->move_line($from_position, $to_position); func2($file_manager); func3($file_manager); $file_manager->move_line($from_position2, $to_position2); }
    Now you have a clear separation of roles: File::Manager just diddles with the file in whatever way makes the most sense, while the functions tell File::Manager what they want done.

    This means that the low-level solution in File::Manager can be swapped around to anything that makes the process work without impacting the logic of what the main program wants done. Plus, you have the advantage that you can test the two parts better: the File::Manager code can be functionally tested - I wrote a line, did it get there? I moved a line that doesn't exist, did the right thing happen? - and the mainline code can talk to a dummy version of File::Manager that just puts lines into an array so you can check if the algorithm in the main program is doing what it ought to do.

Re: Design/Style question about writing to a file from different functions
by bluescreen (Friar) on Jul 21, 2010 at 13:32 UTC

    I agree with some of the monks here that having to modify files on-the-fly is fishy. The simplest solution here is to write the checksum at the end ( following the chronological order in which data is being generated ), if I own the process that consumes the file I'd definitely modify the parser to read checksum at the end instead of the beginning.

    Sometimes that's just not possible and you have to follow an specification, in that case I'd create a module with the following API

    package File::MyFileType; sub open { my ($class, $filename ) = @_; ... $self->_initialize_headers; # initialize headers would write something like # Checksum: XXXXXXXXXXXXXX in the first line # reserving space for file's header } sub write { my ($self, $data) = @_; $self->_update_checksum($data); } sub close { $self->_update_headers; # Update header rewinds the file and replaces the # XXXXXXXXXX with the accumulated checksum ... } 1;

    Then your App would be abstracted and it'd use File::MyfileType class, class can also be extended to support read/write and the process consuming can share same interface

Re: Design/Style question about writing to a file from different functions
by aquarium (Curate) on Jul 20, 2010 at 23:53 UTC
    If you're after a "best" implementation for you and not just wanting to choose solution A or B, a couple of other alternatives are tie::file or even create and maintain an array or hash and only write it (atomically) at end of program.
    the hardest line to type correctly is: stty erase ^H
      If you're after a "best" implementation ... tie::file

      There's a couple of problems with that.

      1. Performance:

        Writing a file with Tie::File, even with memory allocated to easily accommodate the whole file, is orders of magnitude slower than direct writing.

        c:\test>junk7 -N=10e3 ### 1/2 MB Took 0.098 seconds Took 34.255 seconds c:\test>junk7 -N=20e3 ### 1 MB Took 0.197 seconds Took 137.506 seconds c:\test>junk7 -N=1e6 ### 50 MB Took 9.449 seconds ^C

        By the time you get to 50 MB I estimate it will take hours instead of 10 seconds.

      2. There doesn't seem to be any simple way to binmode a Tie::File tied file. Which means that on some systems, the data in the file will be different to that checksummed:
        21/07/2010 01:26 510,033 junk.dat 21/07/2010 01:26 520,034 junk2.dat

      Test code:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; use Tie::File; use Digest::MD5 qw[ md5_hex ]; our $N //= 1e6; my $start = time; open OUT, '+>:raw', 'junk.dat'; print OUT md5_hex( 0 ); my $data = 'x' x 50; my $md5 = new Digest::MD5; for ( 1.. $N ) { print OUT $data; $md5->add( "$data\n" ); } seek OUT, 0, 0; print OUT $md5->hexdigest; close OUT; printf "Took %.3f seconds\n", time-$start; $start = time; tie my @lines, 'Tie::File', 'junk2.dat', memory => 52 * $N; $md5 = new Digest::MD5; push @lines, md5_hex( 0 ); for ( 1.. $N ) { push @lines, $data; $md5->add( "$data\n" ); } $lines[ 0 ] = $md5->hexdigest; untie @lines; printf "Took %.3f seconds\n", time-$start;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.