G'day
pelp,
Assuming you want a CSV without the pretty spacing that your html table produced, and assuming that your current records are separated by two newlines, ie:
Author : tom jones
Number : abc123
Version Number : 17
Feature : nothing was changed
File Name : house.doc
Modification Date : 05/16/2002
Paragraph Number Requirement Number Last Modified
BCBLUE-BC-191.a SMAPSFS-VPU-1232 17
BCBLUE-BC-232.g SMAPSFS-VPU-2342 17
Author : fred jones
Number : abc124
Version Number : 18
Feature : nothing much was changed
File Name : house.doc
Modification Date : 05/18/2002
Paragraph Number Requirement Number Last Modified
BCBLUE-BC-191.a SMAPSFS-VPU-1232 18
BCBLUE-BC-232.g SMAPSFS-VPU-2342 18
And your input is kinda well formed etc, then the following code:
use strict;
$/ = ""; # paragraph mode.
print "File Name,Author,Date (MM/DD/Year),TIME (H:M:S),Version No.,".
"Number,Feature Name,Paragraph Number,Requirement Number\n";
while(<>)
{
# $_ =~ Author : foo\nNumber : abc....
# These regexps may need changing if you allow
# other characters in them. You may find something
# more general such as what I use for Feature
# best for all fields...
my ($author) = m/^Author\s+:\s+([\w ]+)$/m;
my ($number) = m/^Number\s+:\s+([\w ]+)$/m;
my ($version) = m/Version Number\s+:\s+([\w ]+)$/m;
my ($feature) = m/Feature\s+:\s+([^\s].*)$/m;
my ($filename) = m/File Name\s+:\s+([\w._-]+)$/m;
my ($mod_date) = m!Modification Date\s+:\s+(\d{2}/\d{2}/\d{4})
+!m;
# Hope that Paragraph Number etc occurs at the end of
+the
# record.
my ($otherjunk) = m/Paragraph(.*)$/s;
my @paragraphs = (split /\n/, $otherjunk);
shift @paragraphs; # don't need headings;
foreach my $line (@paragraphs)
{
my ($para, $requirement) = split(/\s+/, $line);
print qq{"$filename","$author","$mod_date","","$versio
+n",}.
qq{"$number","$para","$requirement"\n};
}
}
will produce:
File Name,Author,Date (MM/DD/Year),TIME (H:M:S),Version No.,Number,Fea
+ture Name,Paragraph Number,Requirement Number
"house.doc","tom jones","05/16/2002","","17","abc123","nothing was cha
+nged","BCBLUE-BC-191.a","SMAPSFS-VPU-1232"
"house.doc","tom jones","05/16/2002","","17","abc123","nothing was cha
+nged","BCBLUE-BC-232.g","SMAPSFS-VPU-2342"
"house.doc","fred jones","05/18/2002","","18","abc124","nothing much w
+as changed","BCBLUE-BC-191.a","SMAPSFS-VPU-1232"
"house.doc","fred jones","05/18/2002","","18","abc124","nothing much w
+as changed","BCBLUE-BC-232.g","SMAPSFS-VPU-2342"
(without the line wrapping)
If your input is reasonably well formed, ie you can rely on having "Author" be the first field, but records are not separated by 2 newlines, run something like the following over your data file first:
while(<>)
{
if(/^Author\s+:\s+/)
{
print "\n";
}
print;
}
The resulting output will be fine for my program above.
I hope this will prove helpful to you.
jarich
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.