comment on

G'day pelp,
Assuming you want a CSV without the pretty spacing that your html table produced, and assuming that your current records are separated by two newlines, ie:

Author                  :  tom jones
Number                  :  abc123
Version Number          :  17
Feature                 :  nothing was changed
File Name               :  house.doc
Modification Date       :  05/16/2002
Paragraph Number           Requirement Number   Last Modified
BCBLUE-BC-191.a            SMAPSFS-VPU-1232          17
BCBLUE-BC-232.g            SMAPSFS-VPU-2342          17

Author                  :  fred jones
Number                  :  abc124
Version Number          :  18
Feature                 :  nothing much was changed
File Name               :  house.doc
Modification Date       :  05/18/2002
Paragraph Number           Requirement Number   Last Modified
BCBLUE-BC-191.a            SMAPSFS-VPU-1232          18
BCBLUE-BC-232.g            SMAPSFS-VPU-2342          18
[download]

And your input is kinda well formed etc, then the following code:

use strict;
$/ = "";                # paragraph mode.

print "File Name,Author,Date (MM/DD/Year),TIME (H:M:S),Version No.,".
      "Number,Feature Name,Paragraph Number,Requirement Number\n";

while(<>)
{
        # $_ =~ Author  : foo\nNumber : abc....

        # These regexps may need changing if you allow
        # other characters in them.  You may find something 
        # more general such as what I use for Feature
        # best for all fields...

        my ($author) = m/^Author\s+:\s+([\w ]+)$/m;
        my ($number) = m/^Number\s+:\s+([\w ]+)$/m;
        my ($version) = m/Version Number\s+:\s+([\w ]+)$/m;
        my ($feature) = m/Feature\s+:\s+([^\s].*)$/m;
        my ($filename) = m/File Name\s+:\s+([\w._-]+)$/m;
        my ($mod_date) = m!Modification Date\s+:\s+(\d{2}/\d{2}/\d{4})
+!m;

                # Hope that Paragraph Number etc occurs at the end of 
+the
                # record.
        my ($otherjunk) = m/Paragraph(.*)$/s;
        my @paragraphs = (split /\n/, $otherjunk);
        shift @paragraphs;      # don't need headings;

        foreach my $line (@paragraphs)
        {
                my ($para, $requirement) = split(/\s+/, $line);
                print qq{"$filename","$author","$mod_date","","$versio
+n",}.
                      qq{"$number","$para","$requirement"\n};
        }
}
[download]

will produce:

File Name,Author,Date (MM/DD/Year),TIME (H:M:S),Version No.,Number,Fea
+ture Name,Paragraph Number,Requirement Number
"house.doc","tom jones","05/16/2002","","17","abc123","nothing was cha
+nged","BCBLUE-BC-191.a","SMAPSFS-VPU-1232"
"house.doc","tom jones","05/16/2002","","17","abc123","nothing was cha
+nged","BCBLUE-BC-232.g","SMAPSFS-VPU-2342"
"house.doc","fred jones","05/18/2002","","18","abc124","nothing much w
+as changed","BCBLUE-BC-191.a","SMAPSFS-VPU-1232"
"house.doc","fred jones","05/18/2002","","18","abc124","nothing much w
+as changed","BCBLUE-BC-232.g","SMAPSFS-VPU-2342"
[download]

(without the line wrapping)

If your input is reasonably well formed, ie you can rely on having "Author" be the first field, but records are not separated by 2 newlines, run something like the following over your data file first:

while(<>)
{
      if(/^Author\s+:\s+/)
      {
             print "\n";
      }
      print;
}
[download]

The resulting output will be fine for my program above.

I hope this will prove helpful to you.

jarich

In reply to Re: Converting logs to CSV format (desperate help) by jarich
in thread Converting logs to CSV format (desperate help) by pelp

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


P is for Practical
	PerlMonks