Re: How to create XML tree from non-XML source
by dHarry (Abbot) on Sep 08, 2008 at 15:17 UTC
|
There are many Perl modules available for generating/creating XML:
Any::Renderer::XML
(generates "element only" XML),
XML::Generator or
XML::Writer to name a few. It depends a bit on what you really need in terms of XML features and how far you wanna push it.
For xpath you can use XML::XPath.
I am using XML::Twig a lot lately. It is turning into a one-solution-for-all-xml-problems for me:-)
Hope this helps
| [reply] |
|
Thank you for your suggestions. XML::Generator looks good. But how can I manipulate a tree created with XML::Generator?
| [reply] |
|
XML::Generator is really intended for converting existing data structures to XML. If you want to manipulate them a bit before outputting them, I'm going to second the recommendation for XML::Twig, which is easy to use and fairly well documented.
Here's a quickie example for you, though there are better ones at the link above:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig;
my $root = "<nodetag />";
my $element;
my $firstelem;
my $childcnt;
$twig = XML::Twig->new(
output_encoding => 'utf8',
pretty_print => 'record');
# $root is a string containing the starting tag
$twig->parse($root);
$root = $twig->root;
# $root is now the root twig element, and we can modify it
$root->set_gi('nodetag_root');
# We can add children to it
foreach $childcnt (0 .. 10)
{
$element = XML::Twig::Elt->new('childtag' => 'child text');
$element->set_att('index',$childcnt);
$element->paste('last_child',$root);
}
# We can modify an arbitrary child
$element = $root->first_child('childtag[@index="5"]');
$element->set_text('Number Five, alive!');
# And we can print it, to a filehandle if necessary
$twig->print;
It outputs:
<?xml version="1.0" encoding="utf8"?>
<nodetag_root>
<childtag index="0">child text</childtag>
<childtag index="1">child text</childtag>
<childtag index="2">child text</childtag>
<childtag index="3">child text</childtag>
<childtag index="4">child text</childtag>
<childtag index="5">Number Five, alive!</childtag>
<childtag index="6">child text</childtag>
<childtag index="7">child text</childtag>
<childtag index="8">child text</childtag>
<childtag index="9">child text</childtag>
<childtag index="10">child text</childtag>
</nodetag_root>
| [reply] [d/l] [select] |
|
Re: How to create XML tree from non-XML source
by themage (Friar) on Sep 08, 2008 at 14:16 UTC
|
Hi H4,
You can try to use XML::Simple' XMLout as long as you have a perl hash representing the data you want to write to XML.
use XML::Simple qw(XMLout);
my $data={book=>[{name=>"test",author=>"H3"},{name=>"test2",author=>"H
+4"}]};
print XMLout($data,NoAttr=>1,RootName=>"books");
| [reply] [d/l] |
|
Thanks for your input. Unfortunately, XML::Simple does not preserve the ordering of subnodes because it uses hashes rather than lists. In your example, there is no way of telling whether <name> or <author> should appear first in the resulting XML. Sorry I forgot to mention that, in my case, order does matter.
| [reply] |
Re: How to create XML tree from non-XML source
by GrandFather (Saint) on Sep 08, 2008 at 22:22 UTC
|
This looks somewhat like the wrong question. XML is a file based representation of some data. XPath is a protocol for locating information in an XML file. Neither imply any particular internal representation during processing.
So, what is the bigger picture? What input data have you and what do you want to achieve with it?
Perl reduces RSI - it saves typing
| [reply] |
|
My original data is genealogical data in GEDCOM format. GEDCOM is a well-documented standard, yet every GEDCOM-able software creates files that, in one way or another, violate that standard. My idea is to create an intermediate form which can be converted to and from all involved 3rd party GEDCOM styles. I chose XML because GEDCOM is a tree structure, and I thought it is better to use existing tools for manipulating trees than to re-invent them.
Yes, I know there is a Gedcom package on CPAN, but it cannot read 5 out of 9 test files, and does not handle character sets correctly.
I want to use XPath expressions to locate the nodes which must be modified, then modify them as required, then save the tree to an XML file. I don't mind saving the unmodified XML tree to an intermediate file if I must. But then, using XPath to locate a node, how do I do my modification? This may include renaming the node's type, changing the text, moving the node up in the tree, or creating subnodes. Are XML and XPath the wrong tools? Maybe I'll have to create my own code to locate nodes, rather than using XPath?
| [reply] |
|
XML is in essence a file format. It is not generally used as an in memory representation of data from some other file format. Unless you want to store an intermediate form of the data on disk in some non-GEDCOM format XML is not appropriate. Even then, you would probably be better to store any intermediate form of the data on disk as a clean GEDCOM file (although, see below).
There are many ways to handle trees in Perl (see tree), but probably you are better to write a GEDCOM object hierarchy that directly addresses the structure you need to manipulate.
I note that GEDCOM 6.0 will be an XML based file, but that needn't alter how you internally represent the data. In fact whatever internal representation you choose now should be completely independent of the external representation and should be chosen to facilitate the creation and manipulation of the internal representation. Then it becomes fairly easy to handle different input file formats and generate different output file formats.
Perl reduces RSI - it saves typing
| [reply] |
|
|
|
Re: How to create XML tree from non-XML source
by GrandFather (Saint) on Sep 10, 2008 at 02:22 UTC
|
use strict;
use warnings;
use Tree::DAG_Node;
my $root = Tree::DAG_Node->new ();
my $level = 0;
my $currMother = $root;
while (<DATA>) {
chomp;
s/^\s+//;
my ($lineLevel, $tag, $tail) = split ' ', $_, 3;
my $newDaughter;
while ($lineLevel < $level) {
$currMother = $currMother->mother ();
--$level;
}
if ($lineLevel > $level) {
$newDaughter = $currMother = $currMother->new_daughter ();
die "Adjacent lines differ by more than one level at line $."
if ++$level != $lineLevel;
}
$newDaughter = $currMother->new_daughter () unless $newDaughter;
$newDaughter->name ($tag);
$newDaughter->attribute ()->{data} = $tail;
}
print "<root>\n";
$root->walk_down ({callback => \&enterNode, callbackback => \&exitNode
+, _depth => 0});
print "</root>\n";
sub enterNode {
my ($node, $options) = @_;
return 1 if ! defined $node->{name};
print ' ' x ($options->{_depth} * 3);
print "<$node->{name}>";
print $node->attribute ()->{data} if defined $node->attribute ()->
+{data};
print "\n";
return 1;
}
sub exitNode {
my ($node, $options) = @_;
return if ! defined $node->name ();
print ' ' x ($options->{_depth} * 3);
print "</$node->{name}>\n";
}
__DATA__
0 HEAD
1 SOUR Reunion
2 VERS V8.0
2 CORP Leister Productions
1 DEST Reunion
1 DATE 11 FEB 2006
1 FILE test
1 GEDC
2 VERS 5.5
1 CHAR MACINTOSH
0 @I1@ INDI
1 NAME Bob /Cox/
1 SEX M
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I2@ INDI
1 NAME Joann /Para/
1 SEX F
1 FAMS @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @I3@ INDI
1 NAME Bobby Jo /Cox/
1 SEX M
1 FAMC @F1@
1 CHAN
2 DATE 11 FEB 2006
0 @F1@ FAM
1 HUSB @I1@
1 WIFE @I2@
1 MARR
1 CHIL @I3@
0 TRLR
Prints:
<root>
<HEAD>
</HEAD>
<SOUR>Reunion
<VERS>V8.0
<CORP>Leister Productions
</CORP>
</VERS>
<DEST>Reunion
</DEST>
<DATE>11 FEB 2006
</DATE>
<FILE>test
</FILE>
<GEDC>
</GEDC>
<VERS>5.5
</VERS>
<CHAR>MACINTOSH
</CHAR>
</SOUR>
<@I1@>INDI
</@I1@>
<NAME>Bob /Cox/
<SEX>M
</SEX>
<FAMS>@F1@
</FAMS>
<CHAN>
</CHAN>
<DATE>11 FEB 2006
</DATE>
</NAME>
<@I2@>INDI
</@I2@>
<NAME>Joann /Para/
<SEX>F
</SEX>
<FAMS>@F1@
</FAMS>
<CHAN>
</CHAN>
<DATE>11 FEB 2006
</DATE>
</NAME>
<@I3@>INDI
</@I3@>
<NAME>Bobby Jo /Cox/
<SEX>M
</SEX>
<FAMC>@F1@
</FAMC>
<CHAN>
</CHAN>
<DATE>11 FEB 2006
</DATE>
</NAME>
<@F1@>FAM
</@F1@>
<HUSB>@I1@
<WIFE>@I2@
</WIFE>
<MARR>
</MARR>
<CHIL>@I3@
</CHIL>
</HUSB>
<TRLR>
</TRLR>
</root>
Perl reduces RSI - it saves typing
| [reply] [d/l] [select] |