handling multiple file handles for code generation

What follows doesn't involve a lot of code, but demonstrates some nice things you can do with Perl's powerful. flexible, concise syntax. It is something I used in a project of mine where I am genearating a lot of C++ code.

The code generates a large number of source and header files. In its first incarnation, each file was just opened and closed as needed, like this:

open SRC "> file";
# print lots of stuff
close SRC;
[download]

This was fine for a while, but as time went on the thing got slower and slower, as the number of files grew. Also the sounds coming out of the hard disc suggested that some optimisation was in order. Hmm.

So here's a wish list for some improvements;

keep a number of files open, get back the filehandles as needed by the program by a simple mechanism.
Minimise opening and closing of file handles.
Provide a simple way to print header and footer text in the file (e.g. the guard statements which are necessary in CPP header files).
Clean up nicely when done. This means closing file handles, and calling methods that handle version management software as needed.

Here's how it was done:

{
    my %fhs;

    sub END
    {
    foreach (keys %fhs)
    {
        my $fh = $fhs{$_};

        if ($_ =~ /\.h\.new$/)
        {
        print $fh "#endif\n";
        }
        close $fh;
        if ($_ =~ /\.h\.new$/)
        {
          Clearcase::handleClearCaseElement($_);
        }
    }
    }

sub openFile
{
    my ($file, $header) = @_;
    
    unless (defined $fhs{$file})
    {
    my $f;
    open ($f, "> $file") || die "could not open $file for output $!";
    $fhs{$file} = $f;
    print $f $header;
    }
    return $fhs{$file};
}

}
[download]

It's pretty simple. The END block handles cleanup, adds closing #ENDIFs where necessary, and calls some Clearcase handling routines which take care of version control.. The %fhs hash is private to the END and openFile methods. openFile takes an argument for header info for the file in my system. You could change this around to fit your requirements.

The other code has an easy time now. Whenever a certain file is wanted for output, just call my $fh = openFile('myFilename.cc', $header) and start printing to $fh. Job done.

The script is back down to a few seconds to run as a result of this (from over 30 seconds) and the disc is a lot happier. The code is much more readable too.

It would be wise, if the number of files gets very high, to extend this code to check that the number of open filehandles does not get too close to the system limit. It's possible to see some smart mechanism where least used filehandles are closed and only reopened when requested again ... as I don't need this right now, I'll leave this as an exercise for some future monk ...

Comment on handling multiple file handles for code generation Select or Download Code

Replies are listed 'Best First'.
Re: handling multiple file handles for code generation by blazar (Canon) on Sep 05, 2005 at 12:05 UTC
`foreach (keys %fhs) { my $fh = $fhs{$_}; if ($_ =~ /\.h\.new$/) { print $fh "#endif\n"; }` [download] The whole point of `$_` is that of being the topicalizer: you either want `for my $file (keys %fhs) { # ... print $fh whatever if $file =~ /\.\.new$/; }` [download] or `for (keys %fhs) { # ... print $fh whatever if /\.\.new$/; }` [download] but your mixed form doesn't add to code readability, although -of course- it is not illegal. (I also took the liberty of rewriting the `if` condition as a statement modifier, as IMHO it is clearer that way.)	[reply] [d/l] [select]
Re^2: handling multiple file handles for code generation by danmcb (Monk) on Sep 05, 2005 at 12:16 UTC
I don't understand your point. AFAIK, $_ is just a variable that happens to have a certain well-known value when used inside a loop like this. Of course, using an explicitly named variable will always aid readability (if it is well named) but there is a balance between brevity and "names that mean something". So please clarify - what does "topicalizer" mean, and why does it make what I wrote wriong? (But thanks for at least making a comment after downvoting ... ;-)	[reply]
Re^3: handling multiple file handles for code generation by blazar (Canon) on Sep 05, 2005 at 13:12 UTC
So please clarify - what does "topicalizer" mean, and why does it make what I wrote wriong? It means it acts much like the pronoun "it", that is, it is the implicit argument of many operators and functions. Thus `$_ =~ /$regex/;` [download] is always equivalent to just `/$regex/;` [download] but the latter is more concise, and is typically idiomatic of Perl. So it is most often more clear. If (you think) it is not, then an explicit variable name may be in order. And in that case you have to do `$var =~ /$regex/;` [download] But then, if you have a bunch of matches (or substitutions or ...) to do, people at times even uses a `for` loop just for its aliasing effect, e.g.: `s/$rx1/foo/, s/$rx2/bar/, s/$rx3/baz/ for $var;` [download]	[reply] [d/l] [select]