#!/usr/bin/perl
use XML::Simple;
my @file;
my (@labels) = split /\s+/, <DATA>;
while (<DATA>) {
chomp;
my %line;
@line{@labels} = split /\t/;
push @file, \%line;
}
print XMLout(\@file);
__DATA__
First Last
Fred Flintstone
Barney Rubble
Betty Rubble
Wilma Flintstone
generates
<opt>
<anon First="Fred" Last="Flintstone" />
<anon First="Barney" Last="Rubble" />
<anon First="Betty" Last="Rubble" />
<anon First="Wilma" Last="Flintstone" />
</opt>
Season to taste.
-- Randal L. Schwartz, Perl hacker | [reply] [Watch: Dir/Any] [d/l] [select] |
I hate to point out something in Randalls code, but shouldn't the first split be on tabs and not on one or more spaces ? a label "First Name" could exist.
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] |
AnyData can read data in various formats, including tab-delimited, treat them as relational tables, and export them in other formats, including XML.
Something like adConvert('Tab','foo.tab','XML','foo.xml'); should work.
Of course you can also do it simply by reding the tab delimited file, using split to extract the individual fields and then printing them as you see fit. The requirement to export each single row in a different file is unlikely to be directly supported by any module.
| [reply] [Watch: Dir/Any] [d/l] |
You might want to take a look at XML::SAXDriver::CSV, it's a package that's meant precisely for this purpose. It has the added advantage that it is SAX based. This means that not only can it be used to interact with other XML modules easily, but making it write each row to a different file will be as easy as writing a SAX handler that does just that. It's pretty trivial, and a very powerful approach.
-- darobin -- knowscape 2 coming soon --
| [reply] [Watch: Dir/Any] |
OK, so here is a complete answer, that saves each line in a separate file.
First a couple of remarks:
- when I want to write XML I rarely use any module. I know that XML::Writer is available, and that most transformation modules can be used too, but frankly I don't think they save much energy if you know what you are doing. I especially don't like XML::Simple for this kind of use as it makes it quite difficult to control the structure of the XML output. So I just used good ole print statements.
- one thing that might create bugs if you are not careful is special XML charaters: you need to escape
at least & and < or you risk your XML not being valid. If you create attributes, which is not the case here, you also need to escape either " or ' depending on which one you use as a delimiter.
- lastly I don't know in which encoding the input data comes but I'd be willing to bet that it is not UTF-8, or at least that if some day accented characters creep in they will not be in UTF-8, so I stuck an XML declaration specifying ISO-8859-1 as the encoding on top of each file (I know it augments the size of each one but it should not be too bad once the whole thing is tar.gz'd).
So here it is:
#!/usr/bin/perl
my $file_nb="000";
# write labels
my (@labels) = split /\t/, <DATA>;
my @labels= map { sanitize_label( $_) } @labels;
my $file= "data-$file_nb.xml";
open( LABELS, ">$file") or die "cannot open $file: $!";
print LABELS qq{<?xml version="1.0" encoding="ISO-8859-1"?>\n},
"<labels>",
map( { "<col>" . $_ . "</col>"} @labels),
"</labels>\n";
close LABELS;
# write data
while (<DATA>)
{ my %line;
chomp;
@line{@labels} = split /\t/;
$file_nb++;
my $file= "data-$file_nb.xml";
open( XML, ">$file") or die "cannot open $file: $!";
print XML qq{<?xml version="1.0" encoding="ISO-8859-1"?>\n},
qq{<data record_no="$file_nb">},
map( { "<$_>" . xml_escape( $line{$_}), "</$_>"} @labels
+),
"</data>\n";
close XML;
}
# dumb way to make label valid XML names: remove all non word characte
+rs
sub sanitize_label
{ my $label= shift;
$label=~ s/[\W]//g;
return $label;
}
# just escape the minimum: < and &
sub xml_escape
{ my $text= shift;
$text=~ s/&/&/g;
$text=~ s/</</g;
return $text;
}
__DATA__
First Last
Fred Flintstone
Barney Rubble & all
Betty Rubble
Wilma Flintstone
| [reply] [Watch: Dir/Any] [d/l] |
| [reply] [Watch: Dir/Any] |