Re: Predefining complex data structures?
by broquaint (Abbot) on Jul 12, 2002 at 14:46 UTC
|
You don't need to predefine your datastructure as it will be created as you insert the data, which you could do like so
my %tagstack;
# ... parsing code here
push @{$tagstack{requirements}}, {
# $tag = wherever the data is coming from
text => $tag{cdata},
attributes => $tag{attribs}
};
If you're just using simple nested hashes then XML::Simple may be just the module for you, and if you continue to get muddled by references and reference syntax check out the perlreftut and perlref manpages.
HTH
_________ broquaint | [reply] [d/l] |
|
|
Thanks! This was extremely helpful.
| [reply] |
Re: Predefining complex data structures?
by dragonchild (Archbishop) on Jul 12, 2002 at 14:50 UTC
|
First, don't pre-define. Let Perl do the auto-vivification for you. That's what it's there for. Especially because you don't know for certain what will be there, just what structure it will be in.
Secondly, you want to do something along the lines of:
# Caveat Lector - this is untested!
foreach $tag (@tags) {
push @{$tagstack{$tag->name}}, {
text => $tag->value,
attributes => {
split /[\s=]+/, $tag->attributes
},
};
}
Obviously, I'm assuming that $tag is some object with three methods - name(), value(), and attributes(). These will have to
return the appropriate values from the data source. Also, I'm assuming that attributes() will return everything within the tag
definition other than the tag's name. If all you get is the text, you could do something like:
foreach my $tag (@tags) {
my ($name, $attributes, $value) = $tag =~
m#^<(\w+)\s+(\w+)?\s*>(.*?)</\1>$#;
# Use $name, $attributes, and $value as per above
}
The regex could be tightened, but this is off-the-cuff.
------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement. | [reply] [d/l] [select] |
Re: Predefining complex data structures?
by thraxil (Prior) on Jul 12, 2002 at 14:54 UTC
|
first, i would change the data structure to:
$tagstack{requirements}->[1] = {
text => "A node name",
attributes => {contactname => "Jane Smith",
contactnumber => "555-1212"
}
}
note that it is not a hash containing a reference to an array which contains hashrefs. also note the curly braces around the attributes hash instead of square braces.
if XML::Simple is up to the task of parsing your XML, this is fairly straightforward. here's a little script showing how you would go about it:
#!/usr/bin/perl -wT
use strict;
use XML::Simple;
use Data::Dumper;
my $data = XML::Simple::XMLin('./test.xml');
# show what we start with
print Data::Dumper::Dumper($data);
my %tagstack;
my @temp;
foreach my $h (@{$data->{requirement}}) {
my %t;
$t{text} = $h->{content};
delete $h->{content};
$t{attributes} = $h;
push @temp, \%t;
}
$tagstack{requirement} = \@temp;
# show the finished product
print Data::Dumper::Dumper(\%tagstack);
with test.xml being:
<root>
<requirement contactname="Joe Average">A power cord.</requirement>
<requirement contactname="Jane Smith" contactnumber="555-1212">A node
+name</requirement>
</root>
it gives the following output:
$VAR1 = {
'requirement' => [
{
'contactname' => 'Joe Average',
'content' => 'A power cord.'
},
{
'contactnumber' => '555-1212',
'contactname' => 'Jane Smith',
'content' => 'A node name'
}
]
};
$VAR1 = {
'requirement' => [
{
'text' => 'A power cord.',
'attributes' => {
'contactname' => 'Joe
+ Average'
}
},
{
'text' => 'A node name',
'attributes' => {
'contactnumber' => '5
+55-1212',
'contactname' => 'Jan
+e Smith'
}
}
]
};
personally, i think the data structure that XML::Simple produces is more intuitive, but you've probably got a reason for wanting it in the format you do.
anders pearson
| [reply] [d/l] [select] |
|
|
Okay, I've looked over the changes you suggested to the data structure. The only thing I wasn't quite clear on was what the -> arrow operator at the beginning does. I haven't seen it used in that particular way in Perl before. Admittedly I'm still less than 200 pages into the Camel book.
Unfortunately though it is more intuitive, XML::Simple isn't quite enough to do what I need to do as it would be more complicated to reassemble the data in the <method> structure I provided in another part of this thread than it would be to just stick with XML::Parser. With XML::Parser I can use an if or a case to fire off different code for an <object> or <input> element so that I can apply formatting (that's all the object and input tags are for) without have to reassemble the strings.
Thanks!
| [reply] [d/l] [select] |
(jeffa) Re: Predefining complex data structures?
by jeffa (Bishop) on Jul 12, 2002 at 14:54 UTC
|
I can't think of a single good reason to do this, as the
data structure returned by XML::Simple should work for
just about any need you have:
$VAR1 = {
'requirement' => [
{
'contactname' => 'Joe Average',
'content' => 'A power cord.'
},
{
'contactnumber' => '555-1212',
'contactname' => 'Jane Smith',
'content' => 'A node name'
}
]
};
But, since you asked, how about this:
use strict;
use XML::Simple;
my $data = do {local $/;<DATA>};
my $xml = XMLin($data,forcearray=>1);
my $new;
for my $req (@{$xml->{requirement}}) {
my %temp;
$temp{text} = delete $req->{content};
$temp{attributes} = [%$req];
push @{$new->{requirements}}, {%temp};
}
__DATA__
<xml>
<requirement contactname="Joe Average">A power cord.</requirement>
<requirement contactname="Jane Smith" contactnumber="555-1212">A node
+name</requirement>
</xml>
This produced the following data structure for me:
$VAR1 = {
'requirements' => [
{
'text' => 'A power cord.',
'attributes' => [
'contactname',
'Joe Average'
]
},
{
'text' => 'A node name',
'attributes' => [
'contactnumber',
'555-1212',
'contactname',
'Jane Smith'
]
}
]
};
UPDATE:
Woah! Sorry Ionizor, i read your post and parsed XML::Parser as XML::Simple. Forgive me, that's what i get
for trying to answer questions in the morning without the
prerequisite cup 'o joe first. :) But yes, XML::Simple will
parse that XML snippet you provided:
$VAR1 = {
'method' => [
{
'object' => [
'Properties',
'Do not use option Foo',
'Server Name',
'OK'
],
'content' => [
'Open up the ',
' page. Then uncheck the ',
' checkbox. Under ',
' enter ',
' and then hit '
],
'input' => [
'www.example.com'
]
}
]
};
But that probably will not work for you. :(
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] [select] |
|
|
At the moment I'm using XML::Parser because I wasn't sure if XML::Simple would correctly handle things like:
<method>Open up the <object>Properties</object> page. Then uncheck the <object>Do not use option Foo</object> checkbox. Under <object>Server Name</object> enter <input>www.example.com</input> and then hit <object>OK</object></method>
Which I will be processing a little later on in the script.
| [reply] [d/l] |
Re: Predefining complex data structures?
by demerphq (Chancellor) on Jul 12, 2002 at 15:00 UTC
|
my %struct;
foreach my $elem (@elements) { # loop over all the elements
# check to make sure our hash key has an array we can push onto.
$struct{$elem->name}=[] unless $struct{$elem->name};
# now create a new sub hash to push onto the array later
my %hash=(text=>$elem->text, attributes=>[]); #initialize it
# loop over each attribute in the element
foreach my $attrib ($elem->attribs) {
# push the elements onto the attributes array
push @{$hash{attributes}},$attrib->name,$attrib->value;
}
# push a reference to the newly created hash on the array stored for
+ this element type.
push @{$struct{$elem->name}},\%hash;
}
Now of course you will have to figure out how to convert the pseudo methods ive used here into the real thing. Also, iirc XML does not allow for two attributes of the same name in one tag, so instead of using an array to store them just use a hash (unless of order is important).
HTH
UPDATE The line where I put an array in explicitly is not needed in this scenario, but if we code like
push @{$hash{$key}},'var' unless @{$hash{$key}}>5;
we would, because autovivification doesnt happen in that context. Sorry. And just now in the CB chip pointed out that changing the condition to
push @{$hash{$key}},'var' unless @{$hash{$key}||[]}>5;
would also do the trick, and is probably more elegant, if a touch obfu'd. Thanks chip.
Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look. | [reply] [d/l] [select] |
|
|
Thanks for the nod, demerphq, but I think I can do it my hack one better:
for ($hash{$key})
{ push @$_, 'var' unless @$_ > 5 }
-- Chip Salzenberg, Free-Floating Agent of Chaos | [reply] [d/l] |
|
|
I understand the rest of the snippet but I'm missing the signficance of the 5. Why 5?
| [reply] |
|
|
|
|
|
|
|
|
Heh... I put square brackets instead of braces in my example. I did mean for the attributes to be in a hash rather than an array. Oops!
| [reply] |
Re: Predefining complex data structures?
by djantzen (Priest) on Jul 12, 2002 at 14:50 UTC
|
Why do you feel that this needs to be predefined? Or a better question is, how would you go about doing it given that the structure in its nature is open ended? That is to say, the array reference pointed to by the "requirements" key has no built-in limit, nor does there appear to be a maximum number of attributes in the embedded hash. Thus, if you were to predefine it in some way, you'd have to choose an arbitrary depth to which to do so.
As far as push()ing and shift()ing and what not is concerned, this should do it:
push(@{$tagstack{requirements}}, { text => 'foo', attributes => [ 'wha
+tever' ] } );
| [reply] [d/l] [select] |
|
|
What I had meant by predefining is just to predefine the structure, not the data itself. If I know the structure of the data, defining an appropriate perl structure to hold it shouldn't be that hard.
I was having difficulty with the push because it kept trying to push into a hash (d'oh!) and I didn't know the correct syntax to fix it. Thanks!
| [reply] |
(dkubb) Re: (2) XML parsing and SAX event handlers
by dkubb (Deacon) on Jul 13, 2002 at 10:13 UTC
|
Many of the approaches in this thread centered around using XML::Simple. Why not try using XML::SAX and build your own SAX event handler. I believe it can satify your requirements while at the same time providing more flexibility than XML::Parser's interface.
A good introduction to creating SAX event handlers can be found at XML::SAX::Intro in the XML::SAX distribution on CPAN.
To address you're question here's a working example:
#!/usr/bin/perl -wT
use strict;
use XML::SAX;
use Data::Dumper qw(DumperX);
my $handler = My::SAXParser->new;
my $parser = XML::SAX::ParserFactory->parser(Handler => $handler);
#pass the XML document at the bottom __DATA__ tag to the parser
$parser->parse_string(do { local $/; <DATA> });
print DumperX($handler->nodes);
{
#this class keeps track of the processed nodes
package My::SAXParser;
use strict;
use base qw(XML::SAX::Base);
use Class::MethodMaker
get_set => ['nodes'],
list => ['element_stack'];
use constant SKIP_NODE => 'xml';
sub start_document { shift->nodes({}) }
sub start_element {
my $self = shift;
my $el = shift;
return if $el->{Name} eq SKIP_NODE;
#make note of which element we are processing - in the stack
$self->element_stack_push(\my %element);
foreach my $attribute (values %{$el->{Attributes}}) {
push @{$element{attributes}}, @$attribute{qw(Name Value)};
}
#keep track of all interesting element nodes
push @{ $self->nodes->{$el->{Name}} }, \%element;
return $self->SUPER::start_element($el);
}
sub characters {
my $self = shift;
return unless $self->element_stack_count; #are there any pending
+element nodes to process?
return $self->SUPER::characters($self->element_stack->[-1]->{text}
+ .= shift->{Data});
}
sub end_element {
my $self = shift;
$self->element_stack_pop; #element has been processed, pop it off
+ the stack
return $self->SUPER::end_element(shift);
}
}
__DATA__
<xml>
<requirement contactname="Joe Average">A power cord.</requirement>
<requirement contactname="Jane Smith" contactnumber="555-1212">A node
+name</requirement>
</xml>
This should produce the following output:
$VAR1 = {
'requirement' => [
{
'text' => 'A power cord.',
'attributes' => [
'contactname',
'Joe Average'
]
},
{
'text' => 'A node name',
'attributes' => [
'contactnumber',
'555-1212',
'contactname',
'Jane Smith'
]
}
]
};
I tested this code with the other XML document example you posted in this thread. It can parse it and I believe it produces a pretty reasonable output.
Also if performance is an issue it's possible to gain further speed increases using XML::LibXML::SAX::Parser or XML::SAX::Expat. Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code: the use and new constructor statements. | [reply] [d/l] [select] |
|
|
Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code
Actually, it shouldn't be necessary to modify the code at
all. Your sample code uses XML::SAX::ParserFactory which will
use the system default SAX parser (as defined in
lib/XML/SAX/ParserDetails.ini). So if you install
XML::SAX::Expat, your script will immediately make use of it.
| [reply] |
|
|
I found the SAX documentation rather confusing the first time I read it over so I put it down for a while. Now I've picked it back up and with a little help from O'Reilly's Perl and XML I'm recoding into XML::SAX.
On a related note, I highly recommend O'Reilly's Safari service. Online books! It's very cool.
| [reply] |