Nesh has asked for the wisdom of the Perl Monks concerning the following question:

Hello... In this script I am trying to match a pattern across various files and write the output to one csv file. Thus I use headers in the csv file which is written more than once. How can I use regular expression to write the headings just once.
open ERROR, ">>$exceptionoutput" or $log->logdie ("Error opening $exce +ptionoutput file\n"); # Print the headings for the CSV file while(ERROR){ if($_ !~/STATUS CODE([0-9A-Za-z].*)VIP CODE/){ printf ERROR "STATUS CODE,TIMESTAMP,USERID,TRANSACTION TYP +E,POLICY NUMBER,CREATION TYPE,VIP CODE\n"; last; } } close ERROR;
Now despite this the heading is created 7 times if I run the same script 7 times through a batch file. Thanks

Replies are listed 'Best First'.
Re: CSV and regular expressions
by Tanktalus (Canon) on Jun 30, 2005 at 00:55 UTC

    To be honest, I'd use something like DBD::CSV for this.

    One reason is to let someone else handle it for me, and the other reason is that anything that's in a CSV file today makes a perfect candidate to moving to an RDBMS later. It may be a bit slow, but I'll take slow and correct over fast and 7 headers any day of the week :-)

Re: CSV and regular expressions
by graff (Chancellor) on Jun 30, 2005 at 02:45 UTC
    Let me see if I can rephrase the problem:
    • You want to create a single output csv file, based on a set of distinct csv input files.
    • For each input file, you are going to run your script once, and this will append data to the common output file
    • (Or, for each input file, you will do one iteration of a loop in your script to append to the output file? This seems like a reasonable approach.)
    • You need to make sure that the line containing the "csv header" (listing of column names) is written only once, at the very beginning of the output file.

    If that's a fair description of the problem, then you could use the following logic (expressed here in pseudocode):

    open a given input file open the output file read first line of input (this will be the csv header) look at the size of the output file if output file size is zero, write the csv header to output loop through remaining lines of input file write lines to output when appropriate
    But if I've drawn the wrong conclusion about what you're trying to do, please clarify.
Re: CSV and regular expressions
by CountZero (Bishop) on Jun 30, 2005 at 06:02 UTC
    What are you trying to do when you do while(ERROR){if($_ !~/STATUS CODE([0-9A-Za-z].*)VIP CODE/){... }}?There is nothing in $_ (at least not what you are expecting) since you are not reading from the ERROR-filehandle. To do that you must use <ERROR>. Then you read from the ERROR-filehandle, get the next line in $_ and you can test it. Now your regex fails everytime (and the test succeeds because you use !~) and you print the header file again and again.

    The easiest way to deal with your program is to make the ERROR-file by hand (just typing the header-line in a text-editor or so) and never, ever write the header to it, just the lines you extracted from the other files.

    Sometimes automation just gets in the way.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: CSV and regular expressions
by sk (Curate) on Jun 30, 2005 at 04:21 UTC
    If it is something like what graff had pointed out then here is a simple solution. I keep track of the count instead of file size as suggested by graff

    #!/usr/bin/perl -w # filename: catskiphead # Merge files skipping the first line from the seocnd file onwards my $line = 1; while(<>) { next if ($line++ == 0); $line = 0 if eof; print ($_); }
    Input

    file1: header 1 2 3 file2: header 4 5 6 file3: header 7 8 9 <p><c>Usage: catskiphead file1 file2 file3

    Output

    header 1 2 3 4 5 6 7 8 9

    cheers

    SK

    PS: Note that you can change the print to take file handles and also do any checking on the input records before you write out.

Re: CSV and regular expressions
by anonymized user 468275 (Curate) on Jun 30, 2005 at 14:43 UTC
    I feel the step of designing the storage in perl for the given data is being taken for granted and so I would want to see how the data is being stored before moving on to look at the output.

    One world, one people