Re: parsing files and modifying format

Perhaps you haven't explained your goal very well, but in any case I'm sure your code as posted does not do what you want. If I understand it correctly:

There is one directory full of files (so no need to do recursion over subdirectories).
Each file contains a relatively small number of lines.
Each line is a "name,value" pair (except the first line, which is just a timestamp).
You want all the lines in a given file to joined into a single line, by converting line breaks into commas, so each file ends up being just "name,value,name,value,..."
When that processing is done, you want to load all the file data into an excel spreadsheet, with each original source file being one row.

If that's the basic plan, I wouldn't bother with writing modified versions of every original file. Just output a single stream of data (store it all in a single file), having one line of text for each input file. Something like this:

#!/usr/bin/perl

use strict;  # you really should!

( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n"
+;

my $dirname = shift;
opendir( DIR, $dirname ) or die "$dirname: $!";

my @files = grep /.+\.txt$/, readdir DIR;
for my $file ( @files ) {
    my ( $basename ) = ( $file =~ /(.+)\.txt$/ );
    if ( open( F, $file )) {
        my ( $date, @lines ) = <F>;
        chomp $date;
        chomp @lines;
        print join( ",", "sourcefile", $basename, "date", $date, @line
+s ), "\n";
        close F;
    }
    else {
        warn "$file: $!\n";
    }
}
[download]

(I'm assuming you want to keep track of the original file names where all of the output rows come from, as well as all the time-stamps.)

Of course, if it turns out that all the files have the same number of lines, and the same set of names in their various "name,value" pairs, and these names are always in the same order, a better input for excel would have a single line at the top with the names, and then each following line would just have the values in the proper order:

#!/usr/bin/perl

use strict;

( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 dir_name > out.csv\n"
+;

my $dirname = shift;
opendir( DIR, $dirname ) or die "$dirname: $!";

my ( $firstfile, @files ) = grep /.+\.txt$/, readdir DIR;

open( F, $firstfile ) or die "$firstfile: $!";
my ( @names, @values );
my ( $basename ) = ( $firstfile =~ /(.+)\.txt$/ );
my $date = <F>;
while (<F>) {
    chomp;
    my ( $n, $v ) = split /,/, $_, 2;
    push @names, $n;
    push @values, $v;
}
close F;
print join( ",", "sourcefile", "date", @names ), "\n";
print join( ",", $basename, $date, @values ), "\n";

for my $file ( @files ) {
    ( $basename ) = ( $file =~ /(.+)\.txt$/ );
    if ( open( F, $file )) {
        ( $date, @values ) = <F>;
        chomp $date;
        chomp @values;
        s/.+?,// for ( @values );      # delete the "name," parts
        print join( ",", $basename, $date, @values ), "\n";
        close F;
    }
    else {
        warn "$file: $!\n";
    }
}
[download]

(Not tested)

If you're not sure whether all the files have the same set of names in the same order, it would be easy enough to check for that before you try to create the single data stream for importing to excel.

Comment on Re: parsing files and modifying format Select or Download Code

Replies are listed 'Best First'.
Re^2: parsing files and modifying format by grashoper (Monk) on Jul 18, 2008 at 06:17 UTC
graf you are correct sample2 is what I needed, except each file has multiple records, example I provided was merely one of them,it looks like the +? isn't working quite the way I though it would in this one line though..not sure why I expected the names, would not get printed but instead they do appear in my csv, I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. is that due to the join prints above this loop? `for my $file ( @files ) { ( $basename ) = ( $file =~ /(.+)\.txt$/ ); if ( open( F, $file )) { ( $date, @values ) = <F>; chomp $date; chomp @values; s/.+?,// for ( @values ); # delete the "name," parts print join( ",", $basename, $date, @values ), "\n"; close F; } else { warn "$file: $!\n"; } }` [download]	[reply] [d/l]
Re^3: parsing files and modifying format by graff (Chancellor) on Jul 18, 2008 at 23:41 UTC
I'm trying to work out where your sentence boundaries are -- if you add periods and sentence-initial capitalization in the paragraph text, that would help... each file has multiple records, example I provided was merely one of them If you have trouble figuring out how to parse the input files, start a new thread on that. As I mentioned in my first reply, it should be easy to work out how the file contents are arranged, and to treat them accordingly. it looks like the +? isn't working quite the way I though it would in this one line though.. I have no clue what you're talking about there -- maybe if you show a specific input data set and the resulting output, and explain what's wrong with that output, it will be clear what's wrong. I was thinking I would do an ole call to excel and just dump the values right on in there without the names but it doesnt.. I know nothing about ole, and I see no reason to go there, since I do know that a simple comma-delimited file, with a single line at the top for column headings, works fine for importing to excel. Of course, if the "values" contain commas (and quotes), then it is not simple, because fields containing commas as data need to be quoted, and quotes occurring as data in any field need to be escaped. As a rule, it's fairly common for commas and quotes to appear in text data, but it's probably a lot less likely that your data files contain tabs, so you might consider writing tab-delimited output instead of comma delimited -- that works equally well for importing to excel.	[reply]
Re^2: parsing files and modifying format by grashoper (Monk) on Jul 18, 2008 at 05:27 UTC
actually I missed an important piece of information here, I need the labels,login,searchload etc once and the subsequent values are to become rows of data, goal is to put this data into a database, a single line of csv is going to give me a row that exceeds limits in excel, data files are fairly small 11kb or so but there are a lot of them. The timestamp acts as a row separator of sorts its the time the test was run & it should start a new row in excel. this was a rush job I received late in the day and I am finding it much more challenging than I expected. I really appreciate the help.	[reply]
Re^2: parsing files and modifying format by Anonymous Monk on Mar 17, 2009 at 18:02 UTC
Actually I don't see how I delete the name parts, when I run the above I get both names and values, I really just want one row of names and the rest values.	[reply]