Rather than using regexs for this job, I'd recommend to use Text::CSV or Text::CSV_XS to parse the lines of the file into a list and simply add the tags on an element by element basis.
IMHO this is a cleaner approach from a software engineering point of view, but more importantly you're sure that CSV will be handled correctly in all cases.
Hope this helps, -gjb-
| [reply] |
Just to make a minor point (plus get my first post in for 2003! ;)), the original poster's data has semicolons as the separating character instead of the default comma, which can easily be specified in Text::CSV_XS's sep_char attribute in the new() method. Hope I'm not doing anyone's homework:
#!/usr/bin/perl -w
use strict;
use Text::CSV_XS;
my ( $csv, $xml );
$csv = Text::CSV_XS->new( {
'sep_char' => ';'
} );
$xml = '';
while( <DATA> ) {
chomp;
if ( $csv->parse( $_ ) ) {
my ( $line, $n, @fields, $field );
$line = '<row>';
$n = 1;
@fields = $csv->fields();
foreach $field ( @fields ) {
$line .= "<col$n>$field</col$n>";
$n++;
}
$xml .= $line . "</row>\n";
} else {
print "parse() failed on this line: " . $csv->error_input() .
+"\n";
# die?
}
}
print $xml;
__DATA__
a;b;c;d;e
f;g;h;i
j;k
l
m;n;o
Output:
$ ./main.pl
<row><col1>a</col1><col2>b</col2><col3>c</col3><col4>d</col4><col5>e</col5></row>
<row><col1>f</col1><col2>g</col2><col3>h</col3><col4>i</col4></row>
<row><col1>j</col1><col2>k</col2></row>
<row><col1>l</col1></row>
<row><col1>m</col1><col2>n</col2><col3>o</col3></row>
There's probably an XML package out there should your required output become more complex or you want to guarantee that you are using a standardized and optimized solution.
Peace,
Purdy | [reply] [d/l] |
use strict;
use warnings;
use Text::CSV_XS;
use CGI::Pretty qw(-any);
my $i;
my $csv = Text::CSV_XS->new({sep_char => ';'});
while (<DATA>) {
$i = 0;
warn and next unless $csv->parse($_);
print CGI::row(map eval "CGI::col@{[++$i]}('$_')", $csv->fields);
}
__DATA__
colContents;nextColContents;lastColContents
colContents2;nextColContents2;lastColContents2
colContents3;nextColContents3;lastColContents3
a;b;c;d;e
f;g;h;i
j;k
l
m;n;o
The -any pragma in CGI.pm is pretty nifty, you
can actually use it to create XML. No guarantees on valid
XML, of course. Just be sure and either append the CGI
package name or use the OO interface (else Perl will
complain that main has no such method). I was able
to increment the <colN> tags by using
an evil eval trick. That was
the toughest part of this code. Drop that requirement and
utilize CGI.pm's distributive shortcuts feature thingy:
use Text::CSV_XS;
use CGI::Pretty qw(-any);
my $csv = Text::CSV_XS->new({sep_char => ';'});
$csv->parse($_) and print CGI::row(CGI::col([$csv->fields]))
while <DATA>;
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
use strict;
my $str;
while (<DATA>) {
chomp;
my @cols = split(/;/);
for (my $i=1;$i<=@cols;$i++) {
$cols[$i-1]="<col$i>$cols[$i-1]</col$i>";
}
$str .= '<row>' . join('',@cols) . "</row>\n";
}
print $str;
exit;
__DATA__
a;b;c;d;e
f;g;h;i
j;k
l
m;n;o
prints:
<row><col1>a</col1><col2>b</col2><col3>c</col3><col4>d</col4><col5>e</
+col5></row>
<row><col1>f</col1><col2>g</col2><col3>h</col3><col4>i</col4></row>
<row><col1>j</col1><col2>k</col2></row>
<row><col1>l</col1></row>
<row><col1>m</col1><col2>n</col2><col3>o</col3></row>
UPDATE: Doh! of course gjb is right. Text::CSV is a great module. While my code is better than using regexps, Text::CSV is better than using my code.
-pete
"Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere." | [reply] [d/l] [select] |
This got me thinking... I often create xml but always with the first tag being different for each .. well where the tag <row> exists here I normally only need
<row>a</row>
<row2>b</row2>
So here is my stab at doing this with XML::Simple. Please comment on how I can refine this... but I do know it outputs what was asked for... ;-D
#! /usr/bin/perl
use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my (%h);
my $xs = XML::Simple->new( keeproot => 1,
noattr => 1,
noescape => 1);
open (FILE, ">>./Test.xml") or die "Cannot create Test.xml: $!\n";
while (<DATA>) {
delete $h{row};
my $count = 0;
chomp;
my @line = split(/;/);
foreach my $var (@line) {
$count++;
my $tag = "col" . $count;
$h{'row'}{$tag} = $var;
}
print Dumper(\%h);
print FILE $xs->XMLout(\%h);
}
close(FILE);
__DATA__
a;b;c;d;e
f;g;h
i;
j
-----
Of all the things I've lost in my life, its my mind I miss the most.
| [reply] [d/l] [select] |
Personally, i think you shouldn't concern yourself with
incrementing XML tags (even though i played along and
posted a solution myself). The first
<foo> tag encountered is the first, and the
second <foo> tag encountered is the second, and
so on - as long as that's how you intend to read the data.
XML::Simple handles this by exposing an option
(forcearray). Since order will be preserved, why
bother with incrementing the tags? (And besides, wouldn't
that id number be better as an attribute instead?)
I posted a few solutions to converting CSV to XML over at
CSV to XML (the quick and dirty way). Ultimately, CSV::XML looks the
easiest, but i still dig using XML::Generator::DBI.
One option i didn't try at the time, however, was
DBD::AnyData. Here's one that follows the
<colN> naming convention and uses the
previous two modules ... but, caveats:
- commas are used instead of semi-colons -
DBD::AnyData does not handle them, but could be
written to do so, if desired. However, note that CSV stands
for comma. ;)
- ughh ... i have to specify the maximum number of columns
to be expected. This is not something i advocate, but then
again, this is all because the <colN>
have to be incremented.
Enough, here's the code:
use strict;
use warnings;
use DBI;
use XML::Generator::DBI;
use XML::Handler::YAWriter;
my $max = 5;
my $data = join(',',map"col$_",1..$max) . do {local $/;<DATA>};
my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):');
$dbh->func('test', 'CSV', [$data], 'ad_import');
my $generator = XML::Generator::DBI->new(
Handler => XML::Handler::YAWriter->new(AsFile => '-'),
dbh => $dbh,
Indent => 1,
);
$generator->execute('select * from test');
__DATA__
colContents,nextColContents,lastColContents
colContents2,nextColContents2,lastColContents2
colContents3,nextColContents3,lastColContents3
a,b,c,d,e
f,g,h,i
j,k
l
m,n,o
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
my $row = "hello;this;is;a;test";
my $i = 1;
print "row: $row\n";
$row =~ s/;/"<\/col". $i++ ."><col$i>"/eg;
$row = "<col1>" . $row . "<col$i>";
print "row: $row\n";
That works. Not the prettiest thing in the world though.
I have to agree thought that regex is proboly not the best solution for this problem.
This one is kinda prettier :-)
my $row = "hello;this;is;a;test";
my $i = 1;
print "row: $row\n";
@cols = split(/;/,$row);
foreach $col (@cols) { $output .= "<col$i>$col</col$i>"; $i++}
print "row: $output\n";
Notice that these all deal with just a single line though easily expanded to cover multiple lines.
Eric Hodges | [reply] [d/l] [select] |
while(<DATA>) {
my $i;
chomp;
print join('', map { $i++; "<col$i>$_</col$i>" }, split /;/)."\n";
}
__
hello;this;is;a;test
this;is;another;simple;test
Makeshifts last the longest. | [reply] [d/l] |
| [reply] |