CSV and regular expressions

Nesh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: CSV and regular expressions by Tanktalus (Canon) on Jun 30, 2005 at 00:55 UTC
To be honest, I'd use something like DBD::CSV for this. One reason is to let someone else handle it for me, and the other reason is that anything that's in a CSV file today makes a perfect candidate to moving to an RDBMS later. It may be a bit slow, but I'll take slow and correct over fast and 7 headers any day of the week :-)	[reply]
Re: CSV and regular expressions by graff (Chancellor) on Jun 30, 2005 at 02:45 UTC
Let me see if I can rephrase the problem: You want to create a single output csv file, based on a set of distinct csv input files. For each input file, you are going to run your script once, and this will append data to the common output file (Or, for each input file, you will do one iteration of a loop in your script to append to the output file? This seems like a reasonable approach.) You need to make sure that the line containing the "csv header" (listing of column names) is written only once, at the very beginning of the output file. If that's a fair description of the problem, then you could use the following logic (expressed here in pseudocode): `open a given input file open the output file read first line of input (this will be the csv header) look at the size of the output file if output file size is zero, write the csv header to output loop through remaining lines of input file write lines to output when appropriate` [download] But if I've drawn the wrong conclusion about what you're trying to do, please clarify.	[reply] [d/l]
Re: CSV and regular expressions by CountZero (Bishop) on Jun 30, 2005 at 06:02 UTC
What are you trying to do when you do `while(ERROR){if($_ !~/STATUS CODE([0-9A-Za-z].)VIP CODE/){... }}`?There is nothing in $_ (at least not what you are expecting) since you are not reading from the ERROR-filehandle. To do that you must use `<ERROR>`. Then you read from the ERROR-filehandle, get the next line in `$_` and you can test it. Now your regex fails everytime (and the test succeeds because you use `!~`) and you print the header file again and again. The easiest way to deal with your program is to make the ERROR-file by hand (just typing the header-line in a text-editor or so) and never, ever write the header to it, just the lines you extracted from the other files. Sometimes automation just gets in the way. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler."* - Conway's Law	[reply] [d/l] [select]
Re: CSV and regular expressions by sk (Curate) on Jun 30, 2005 at 04:21 UTC
If it is something like what graff had pointed out then here is a simple solution. I keep track of the count instead of file size as suggested by graff `#!/usr/bin/perl -w # filename: catskiphead # Merge files skipping the first line from the seocnd file onwards my $line = 1; while(<>) { next if ($line++ == 0); $line = 0 if eof; print ($_); }` [download] Input `file1: header 1 2 3 file2: header 4 5 6 file3: header 7 8 9 <p><c>Usage: catskiphead file1 file2 file3` [download] Output `header 1 2 3 4 5 6 7 8 9` [download] cheers SK PS: Note that you can change the print to take file handles and also do any checking on the input records before you write out.	[reply] [d/l] [select]
Re: CSV and regular expressions by anonymized user 468275 (Curate) on Jun 30, 2005 at 14:43 UTC
I feel the step of designing the storage in perl for the given data is being taken for granted and so I would want to see how the data is being stored before moving on to look at the output. One world, one people	[reply]