mgolini has asked for the wisdom of the Perl Monks concerning the following question:

Brothers, I have written an application to:
1. Take a directory full of files
2. Create a temp file with 2 lines to prepend
3. Read the original file and extract first and last line timestamps
4. Copy the timestamps to the tmp file (2nd prepended line)
5. Append the lines from the original to the tmp file
And finally
6. Rename the tmp to the original filename.
The code is below
My question is: There is a great deal of redundancy in the first subroutine. How can I compact this? It seems like in order to get the first and last line timestamps, I have to do a series of opening and closing file handles. There must be a better way to accomplish this. I am looking for a way to make it cleaner/neater. The actually works, but its not pretty. I will clean this up by using strict & warnings (I forgot to use those..but hey, like I said, it does the job). Thanks

#! perl.exe -w # Perl Script to change format of Unigen .csv files from version 17 to + version 24. # This script will add two lines to the beginning of each file to acco +modate ARAMS Access Database # and copy the file to a final directory destination. use File::Copy; $indir = "c:/public/conversion"; $outdir = "c:/public/final"; opendir DH,$indir or die "Cannot open $indir: $!"; ## has all the files and directories in the given ## directory. You'll want to screen it to make sure you're ## opening a file. foreach $file(readdir DH) { $name = $file; next if $name =~ /^\./; # skip over dot files &addheader($_); } close DH; sub addheader { open FH, "< $indir/$file" or printf "can't open %s\n",$file; $line = <FH>; $_ = $line; close FH; $field1 = m/^Clock/; # Check for C +lock at beginning of file printf "\n$file"; if ( $field1 == "1"){ printf "\tv17\tFile Needs Appending\n"; open FH1, "< $indir/$file" or printf "can't open %s\n", $file; + # Open source file 1 while <FH1>; my $count = $.; # Gets the tota +l number of lines in file close FH1; open FH1, "< $indir/$file" or printf "can't open %s\n", $file; + my $line=<FH1>; $.= 1; my $number = 2; # Gets beginnin +g of file timestamp - fourth line from the beginning of the file print "$file\n"; do {$line =<FH1>} until $. == ($number); my @fields = parse_csv($line); print "$fields[0]\n"; my $start = $fields[0]; do {$line =<FH1>} until $. == ($count); # Get +end of file timestamp - last line @fields1 = parse_csv($line); my $end = $fields1[0]; close FH1; open FH1, "< $indir/$file" or printf "can't open %s\n", $file; + open FH2, '> c:/public/final/tmp.csv' or die "can't append: $! +" ; # Open target file - create if not there already print FH2 "PREPEND LINE1\n"; # Prepend + text print FH2 "blank,"."blank,"."blank,"."$start,"."$end\n"; + # Prepend first & last timestamp my @lines = <FH1>; foreach $line ( @lines ) # Append tmp f +ile with original files { print FH2 $line; } close FH1; close FH2; $oldfile = 'c:/public/final/tmp.csv'; $newfile = "$outdir/$file"; rename $oldfile,$newfile or die"can't rename files\n"; + # Rename tmp file to original filename print "$oldfile\n"; } else { printf "\tv24\tFile Does Not Need Appending\n"; copy "$indir/$file", "$outdir/$file"; } return ; } sub parse_csv { my $text = shift; my @new = (); push(@new, $+) while $text =~ m{"([^\"\\]*(?:\\.[^\"\\]*)*)",? | ( +[^,]+),?| ,}gx; push(@new, undef) if substr ($text, -1,1) eq ','; return @new; #list of values that were comma-seperated }

janitored by ybiC: balanced <readmore> tages around code

Replies are listed 'Best First'.
Re: Style Comments
by BrowserUk (Patriarch) on Sep 04, 2003 at 23:47 UTC

    Take a look at Tie::File. Your program becomes something like

    #! perl use strict; use Tie::File; my @lines; tie @lines, 'Tie::File', 'filename'; open OUT, '>', 'outputfile' or die $!; print OUT @lines[0, -1, 0 .. $#lines]; close OUT;

    That just prepends the first and last lines to the whole file without doing all the timestamp stuff, but it gives an the idea.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      I once got first and last bits form a file by reading it into a string, grabbing the first bit, reversing the string, grabbing the first bit again (which was the reveresed last bit) then reversing what I grabbed back to normal :-)

      I notice you use Tie::File and Im sure I read in the pod that Tie was redundant and DBI should be used.

      Tie::File looked like so much fun but I was reluctant to try it so now I just read my file into @lines - which I guess has the drawback of taking up more memory space.

      My reluctance is becuase the scripts I write may end up on any version of perl on any server so I dont like relying on specific modules being installed.

      Is Tie::File in common use then in the general perl community?

      ___ /\__\ "What is the world coming to?" \/__/ www.wolispace.com
        I notice you use Tie::File and Im sure I read in the pod that Tie was redundant and DBI should be used.
        I don't see that anywhere in its pod, but Tie::File and DBI do completely different things. Use Tie::File when you need to access a file (any type of text file) through an array interface. Use DBI when you want to access a datasource (could be a file, or a RDBMS) through statements in a data-querying language. I don't believe either one could make the other redundant.

        Tie::File is included in the core Perl distribution since version 5.8.0, which is a pretty good indication of the robustness of its code. Apart from that, it's a pure Perl module, so you can just put the .pm file in the same directory as your script and it will work fine.

        blokhead

        I'm not certain, but maybe the bit of the pod you remember reading is this bit

        dbmopen HASH,DBNAME,MASK [This function has been largely superseded by the tie function.]

        Perhaps that is the source of your confusion?

        I think blokhead answered most of your concerns. All I would add is that (IMO) tie is Perls secret weapon.

        Despite some concerns about the runtime overhead (which is no worse (I think) than calling OO-methods?), it provides a really easy-to-use mechanism with a simple, consistant interface for encapsulation of almost anything behind perls built-in data structures. This gives many of the benefits that OO-ists hold so dear without requiring each new module to invent and document a completely new interface--the bit that even experienced OO-ists get wrong so often. My only wish is that self-typing and nested-ties worked more intuatively.

        I do hope that this simple and infinitely flexible mechanism for abstraction makes it into P6. Hopefully, the parrot engine will make reduce the runtime impact of magic.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Style Comments
by blokhead (Monsignor) on Sep 04, 2003 at 23:58 UTC
    This code could stand a lot of improving. I suggest you start from scratch and do things the "right" way (to name a few: don't open, read, reopen, then reread files just to get the Nth line, don't assign to $., don't pass data to subs using global variables, don't clobber $_, etc)... Tie::File will save you a lot of time. I suggest something like this:
    use strict; use Tie::File; foreach my $file (readdir DH) { ## skip hidden files next if $file =~ /^\./; my @lines; tie @lines, 'Tie::File', $file or die "Couldn't open $file: $!"; ## skip files where the first line contains "Clock" next unless $lines[0] =~ /^Clock/; ## grab CSV timestamp info from 4th line and last line my $start = ( parse_csv( $lines[3] ) )[0]; my $end = ( parse_csv( $lines[-1] ) )[0]; ## prepend lines to the file: modifying the array modifies the fil +e unshift @lines, "PREPEND LINE1", "blank,blank,blank,$start,$end"; }
    You should also consider using Text::CSV or Text::CSV_XS for parsing CSV data, so you don't have to worry about maintaining that CSV regex.

    blokhead

Re: Style Comments
by leriksen (Curate) on Sep 05, 2003 at 04:45 UTC
    Just a few suggestions - I only have a few minutes... A different take on the specific problems in your code, rather than the Tie::File approach - hopefully you'll get some good tips. open FH, "< $indir/$file" or printf "can't open %s\n",$file; This printf will go to STDOUT (insert favourite OS's equivalent here), you really want this to go to STDERR instead, and nicely formatted etc...try
    open FH, "< $indir/$file" or warn "can't open $file :: $!" and return;
    warn goes to STDERR and $! has the OS specific message about why the open failed - we then return to the caller for the next file.

    $line = <FH>; $_ = $line; close FH; $field1 = m/^Clock/; # Check for C +lock at beginning of file printf "\n$file"; if ( $field1 == "1"){
    there is no need to assign to $_, and unless you really have no choice you should consider attempts to as a sign you dont understand the operator about to work in $_ by default. this bit is better rewritten as
    $line = <FH>; close FH; printf "\n$file"; if ( $line =~ m/^Clock/){


    Next, you should take the repeated opening and closing of the same file as a clue that things could be better done - if you need to go to the start again, look at seek(), if you need the current offset look at tell(). I/O is a relatively heavyweight operation, and your process will (on a *nix system) probably get swapped out while this happens. The more you can defer I/O the better - I like to use Tie::Handle, which ties my output to an array, but make all my code look like I am doing I/O. I have experienced up to two orders of magnitude improvement in perform from this simple approach .

    Getting the last line of the file - try File::ReadBackwards, or, if you know the data well enough, grab the last n bytes of the file via operations like seek and read, and scan for the last line separator (dont just look for '\n', use $/.)

    hard-coded paths - try for better independence from OS spcifics - try using File::Spec to build up your paths - if portability is important to you (it may not be now, but later...)

    passing and returning arrays - try not to pass or return an array - Perl ends up making copies and this is inefficient. Use references instead -
    sub ret_a_ref { my @arr; ... return \@arr; } my $a_ref = ret_a_ref(); my $first = $a_ref->[0]; ...
      The more you can defer I/O the better - I like to use Tie::Handle, which ties my output to an array, but make all my code look like I am doing I/O. I have experienced up to two orders of magnitude improvement in perform from this simple approach .
      Wow, what a great tip. Two orders of magnitude! I'll try that next chance I get.

      Cheers!

Re: Style Comments
by mgolini (Novice) on Sep 05, 2003 at 18:30 UTC
    Thanks for all your inputs, they will be implemented as soon as I can. A lot of great tips, Tie::File looks especially interesting.

    Thank you brothers!