comment on

Only another instances of the same script have access to the file, so I think there will be no problem.

It wasn't clear to me who the "writer" was that put the data into this LOG file to begin with. It appeared that you were processing/modifying some file that some other process was appending to (without further info that's what I figured a LOG file was - the term "LOG" just conjures up that image). Perhaps a better name would be "SHARED_CONFIG" or some such name? What you name things within the program actually matters a lot! The use of the name "LOG" triggered some reflexive brain activity and a lot of assumptions about what this file meant.

From your further description, it appears that you are using this file as a kind of IPC (inter process communication) between instances of your own program. You read this file in, process it, write it back out.

flock mode 2 is LOCK_EX (exclusive lock, a "write lock"). From my reading of the Perl function, flock $fh,2 is a blocking call. In C there are other kinds of calls that would not "block". So your version #1 is ok given your additional description. You are doing this right: waiting for exclusive access, do something and then release the lock.

Yes! Closing LOG will release the lock.
You have to have a file open to have a lock!
This looks like could be a problem in Version #2

To reduce memory usage, I understand that you want to process the LOG file line by line instead of making an in memory copy of it? ala: my @logfile=<LOG>;

Ok, there is going to be a tradeoff between performance and memory usage. However, it sounds like the performance is secondary to memory usage (and of course both are less important than file integrity). However, I'm not sure that the first 2 matter - how big is this LOG file?

If I have a lock on the LOG file, closing the LOG file will release the lock on it but that is an issue if this is being used for IPC coordination and you are replacing it with another file (the temp file).

We want to: open the LOG file, process it line by line, write results to a temp file, then replace LOG with that temp file. To replace the log with the temp file (rename), you will have to close it but, closing that file would release the IPC coordination lock.

One solution to this is to have a one byte length "FLAG" or coordination file ( I think even zero bytes is ok). Do a blocking wait for an exclusive lock on that file, then do whatever you want with LOG. In this case, there only needs to one "temp" file and it's name doesn't matter much.

Untested, but the general idea. You will have to add code to make sure that "flagfile" actually exists. It can even be just one byte (or I think in this case even zero bytes) - contents are meaningless.

open (FLAG, '<', "flagfile") or die "cannot open flag $!";
flock FLAG, LOCK_EX or die "cannot lock file! $!"; #mode 2

# you have an exclusive lock to FLAG here..
# by cooperation convention, I that means I have exclusive
# access to both the LOG file and the TEMP file
# other processes don't update or use either 

open (LOG, '<', $file) or die "cannot open $file $!";
open (TEMP, '>', "tempName") or die "cannot open tempName $!";

# there will only be one "tempfile" at a time, so the name
# doesn't matter much. Add your program name into it so that it is
# unique amongst other processes and I think that's all you need
# i.e. no rand() required.

while (<LOG>)
{
   # process each line in LOG
   print TEMP "whatever";
}

close LOG;      # this would release the lock
                # but we are using a different lock 
                # for IPC coordination.
unlink $file;   # I think necessary for the rename of the temp
                # file to the log file's name.
                
close TEMP;
rename "tempName", "$file"; # "LOG" replaced with 
                            # the edited version
close FLAG;  #releases lock.
[download]

Update: Now that I understand the application better, I would definitely be thinking in terms of Grandfather's suggestion to use a DB. I've become quite enamored with SQLite because it doesn't have all of the admin baggage that a "real SQL server" has (and I do have an SQL daemon running on my machine - so I know at least something about the hassle this involves).

However, the above is fairly simple and will work well and fit in with the OP's current code.

In reply to Re^3: Trying to optimize reading/writing of large text files. by Marshall
in thread Trying to optimize reading/writing of large text files. by nikkimouse

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.