Design/Style question about writing to a file from different functions

Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Design/Style question about writing to a file from different functions by ikegami (Patriarch) on Jul 20, 2010 at 16:25 UTC
Given the information I have, I'd go with (1). After the call to sub_func1 I have to modify the file by moving one line to another position in the file. Nothing's stopping you from closing the file before you do this and reopening it afterwards. But moving a line in a file you've just built? That sounds very fishy.	[reply]
Re: Design/Style question about writing to a file from different functions by jethro (Monsignor) on Jul 20, 2010 at 16:30 UTC
In your first scenario solution 1 looks much better for the simple reason that you don't duplicate code (like opening the file 4 times). Duplicate code means more room for bugs if you ever have to change something In your second scenario I would desperately look for a different solution. Changing a line in an already written file is always a hassle. If the contents of the file fits into memory, why not wait with writing to disk and juggle lines in an @array instead. Reordering lines in an array with splice is absolutely trivial	[reply]
Re: Design/Style question about writing to a file from different functions by roboticus (Chancellor) on Jul 20, 2010 at 16:49 UTC
Dirk80: I prefer solution 1. Since it's a simple output stream it would be fine. I think the simplicity of simple writes and passing around a file handle wins. When you get to the further complication, however, I think you should consider wrapping the file I/O into a simple API that lets you concentrate on the business at hand. For that, you'd want a function to write the data, and one to perform the data move. Once you go to the trouble of having several operations you want to perform on your file, then it's worth the little extra work to wrap it up. For example, you might wrap it up as simply as something like: `{ my $FName='default.log'; my $fh; sub _open { open $fh, '+>', $FName or die "Can't open $FName: $!"; } sub write { my $data = shift; _open() if !defined $fh; syswrite($fh, $data) or die "write() error: $FName: $!"; } sub move { my ($src_pos, $dest_pos, $bytes) = @_; sysseek($fh, $src_pos, SEEK_SET); my $data; sysread($fh,$data,$bytes); sysseek($fh, $dest_pos, SEEK_SET); syswrite($fh, $data); sysseek($fh, 0, SEEK_END); } # other stuff as needed (setting name,...) }` [download] Of course, that's untested, and you should take the extra step to turn it into a class... Well, that's my opinion. It's worth about what you paid for it... ;^) ...roboticus	[reply] [d/l]
Re: Design/Style question about writing to a file from different functions by afoken (Chancellor) on Jul 20, 2010 at 16:29 UTC
Define "Better". Solution one has an advantage when you use file locking, because the file can be locked until you are done writing it. Solution two has no real advantages: You still need to pass a parameter specifying the file name. Use a file handle instead and you have solution one. Solution two is not needed, and the new behaviour does not break solution one: Open the file in read/write mode and you can modify it while still holding a lock. Also, rethink your concept: Why do you think you need to modify the file on disk? Modify the way you write the data into the file so it is written in correct order. Think about reducing the complexity of the routines writing the file. Split them into smaller parts that can be called in any order needed. If you have the memory, think about creating an array of lines (or any other useful data structure, like a tree) first, process it as needed, and finally write the array in one run into the file. Show us the relevant parts of the code, perhaps you are making your own life harder than really needed. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: Design/Style question about writing to a file from different functions by ww (Archbishop) on Jul 20, 2010 at 18:43 UTC
"... main_func calls sub_func1 and after that sub_func2, which are in other modules" Why? If you consider the parameters of "better" to include 'how many times' the solution opens and closes your data file and 'how many modules' you have to load, I submit that refactoring all the scripts using `main_funct`, `sub_func1`, `sub_func2` and whatever the "`add something`" may be, so you end up with a single module... at which point your other alternatives become moot.	[reply] [d/l] [select]
Re^2: Design/Style question about writing to a file from different functions by Dirk80 (Pilgrim) on Jul 20, 2010 at 19:31 UTC
Thank you very much for your answers.Now I'm sure that solution 1 is the better solution. As the author of a perl script my goal is it that it runs on a computer with much and less memory. So I do not know how much memory I have. But to give you numbers. `sub_func1` writes about 50MB to the output file and `sub_func2` writes about 150MB to the output file. Do you think that holding about 200MB in memory is ok and then writing everything to the output file in one swoop? Or the other alternative would be to write first 50MB to the file and then 150MB. How many MB do you usually collect in memory if you want to use it on different computers? Now to the reason why I want to move a line directly after building a part of the file. I'm computing a checksum which I know when `sub_func1` finished. But the line with the checksum has to be at the beginning of the file and NOT at the end. My problem was that I thought that it is better to write everything directly to the file and that it is not ok to collect everything in memory. That's why I had the problem to move a line afterwards. If the memory solution is ok, then everything gets easier. Only `main_func` will then write to the file and the helper functions collect everything in memory.	[reply] [d/l] [select]
Re^3: Design/Style question about writing to a file from different functions by BrowserUk (Patriarch) on Jul 20, 2010 at 21:50 UTC
I'm computing a checksum which I know when sub_func1 finished. But the line with the checksum has to be at the beginning of the file and NOT at the end. Most checksums are (or can be) a fixed length string. So, why not write a dummy checksum at the beginning. Write the rest of the file as you generate it. When you've finished writing, rewind seek the file to the beginning and overwrite the checksum. Close and done. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]
Re^3: Design/Style question about writing to a file from different functions by AndyZaft (Hermit) on Jul 20, 2010 at 20:31 UTC
I would probably use temp files then append the rest to the file with the checksum once everything is done. That way it doesn't have to be stored in the memory while processing and let the OS do it which would most likely be the most efficient. But without specifics it is hard to tell, just poking in the dark here.	[reply]
Re: Design/Style question about writing to a file from different functions by wfsp (Abbot) on Jul 20, 2010 at 16:23 UTC
I think that this kills solution 1. Why do you think that? Could you show us a simple code example of what you have in mind?	[reply]
Re: Design/Style question about writing to a file from different functions by JavaFan (Canon) on Jul 20, 2010 at 17:42 UTC
Now I have to add something to the current behaviour. After the call to sub_func1 I have to modify the file by moving one line to another position in the file. And then sub_func2 is called and everything goes on as before. I think that this kills solution 1. Hmmm, so what magical properties does opening a file have that allows you to move a single line to another position, that you don't seem to be able to do with an open file handle?	[reply]
Re: Design/Style question about writing to a file from different functions by pemungkah (Priest) on Jul 20, 2010 at 21:42 UTC
I agree with the other posters who are suggesting more abstraction. If you disentangle the operations you need to do to perform the file manipulations from the actual things to be accomplished, I think you'll find that it makes the program as a whole decidedly simpler, and wrapping it up in an object seems like a foregone conclusion. Something like this: `sub func1 { my $file_manager = File::Manipulator->new($filename); $file_manager->write_line(...); # maybe other writes ... $file_manager->move_line($from_position, $to_position); func2($file_manager); func3($file_manager); $file_manager->move_line($from_position2, $to_position2); }` [download] Now you have a clear separation of roles: File::Manager just diddles with the file in whatever way makes the most sense, while the functions tell File::Manager what they want done. This means that the low-level solution in File::Manager can be swapped around to anything that makes the process work without impacting the logic of what the main program wants done. Plus, you have the advantage that you can test the two parts better: the File::Manager code can be functionally tested - I wrote a line, did it get there? I moved a line that doesn't exist, did the right thing happen? - and the mainline code can talk to a dummy version of File::Manager that just puts lines into an array so you can check if the algorithm in the main program is doing what it ought to do.	[reply] [d/l]
Re: Design/Style question about writing to a file from different functions by bluescreen (Friar) on Jul 21, 2010 at 13:32 UTC
I agree with some of the monks here that having to modify files on-the-fly is fishy. The simplest solution here is to write the checksum at the end ( following the chronological order in which data is being generated ), if I own the process that consumes the file I'd definitely modify the parser to read checksum at the end instead of the beginning. Sometimes that's just not possible and you have to follow an specification, in that case I'd create a module with the following API `package File::MyFileType; sub open { my ($class, $filename ) = @_; ... $self->_initialize_headers; # initialize headers would write something like # Checksum: XXXXXXXXXXXXXX in the first line # reserving space for file's header } sub write { my ($self, $data) = @_; $self->_update_checksum($data); } sub close { $self->_update_headers; # Update header rewinds the file and replaces the # XXXXXXXXXX with the accumulated checksum ... } 1;` [download] Then your App would be abstracted and it'd use File::MyfileType class, class can also be extended to support read/write and the process consuming can share same interface	[reply] [d/l]
Re: Design/Style question about writing to a file from different functions by aquarium (Curate) on Jul 20, 2010 at 23:53 UTC
If you're after a "best" implementation for you and not just wanting to choose solution A or B, a couple of other alternatives are tie::file or even create and maintain an array or hash and only write it (atomically) at end of program. the hardest line to type correctly is: stty erase ^H	[reply]
Re^2: Design/Style question about writing to a file from different functions by BrowserUk (Patriarch) on Jul 21, 2010 at 00:39 UTC
If you're after a "best" implementation ... tie::file There's a couple of problems with that. Performance: Writing a file with Tie::File, even with memory allocated to easily accommodate the whole file, is orders of magnitude slower than direct writing. `c:\test>junk7 -N=10e3 ### 1/2 MB Took 0.098 seconds Took 34.255 seconds c:\test>junk7 -N=20e3 ### 1 MB Took 0.197 seconds Took 137.506 seconds c:\test>junk7 -N=1e6 ### 50 MB Took 9.449 seconds ^C` [download] By the time you get to 50 MB I estimate it will take hours instead of 10 seconds. There doesn't seem to be any simple way to binmode a Tie::File tied file. Which means that on some systems, the data in the file will be different to that checksummed: `21/07/2010 01:26 510,033 junk.dat 21/07/2010 01:26 520,034 junk2.dat` [download] Test code: #! perl -slw use strict; use Time::HiRes qw[ time ]; use Tie::File; use Digest::MD5 qw[ md5_hex ]; our $N //= 1e6; my $start = time; open OUT, '+>:raw', 'junk.dat'; print OUT md5_hex( 0 ); my $data = 'x' x 50; my $md5 = new Digest::MD5; for ( 1.. $N ) { print OUT $data; $md5->add( "$data\n" ); } seek OUT, 0, 0; print OUT $md5->hexdigest; close OUT; printf "Took %.3f seconds\n", time-$start; $start = time; tie my @lines, 'Tie::File', 'junk2.dat', memory => 52 * $N; $md5 = new Digest::MD5; push @lines, md5_hex( 0 ); for ( 1.. $N ) { push @lines, $data; $md5->add( "$data\n" ); } $lines[ 0 ] = $md5->hexdigest; untie @lines; printf "Took %.3f seconds\n", time-$start; [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]