Predefining complex data structures?

Ionizor has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Predefining complex data structures? by broquaint (Abbot) on Jul 12, 2002 at 14:46 UTC
You don't need to predefine your datastructure as it will be created as you insert the data, which you could do like so `my %tagstack; # ... parsing code here push @{$tagstack{requirements}}, { # $tag = wherever the data is coming from text => $tag{cdata}, attributes => $tag{attribs} };` [download] If you're just using simple nested hashes then `XML::Simple` may be just the module for you, and if you continue to get muddled by references and reference syntax check out the `perlreftut` and `perlref` manpages. HTH `_________ broquaint`	[reply] [d/l]
Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 15:14 UTC
Thanks! This was extremely helpful.	[reply]
Re: Predefining complex data structures? by dragonchild (Archbishop) on Jul 12, 2002 at 14:50 UTC
First, don't pre-define. Let Perl do the auto-vivification for you. That's what it's there for. Especially because you don't know for certain what will be there, just what structure it will be in. Secondly, you want to do something along the lines of: `# Caveat Lector - this is untested! foreach $tag (@tags) { push @{$tagstack{$tag->name}}, { text => $tag->value, attributes => { split /[\s=]+/, $tag->attributes }, }; }` [download] Obviously, I'm assuming that $tag is some object with three methods - name(), value(), and attributes(). These will have to return the appropriate values from the data source. Also, I'm assuming that attributes() will return everything within the tag definition other than the tag's name. If all you get is the text, you could do something like: `foreach my $tag (@tags) { my ($name, $attributes, $value) = $tag =~ m#^<(\w+)\s+(\w+)?\s>(.?)</\1>$#; # Use $name, $attributes, and $value as per above }` [download] The regex could be tightened, but this is off-the-cuff. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply] [d/l] [select]
Re: Predefining complex data structures? by thraxil (Prior) on Jul 12, 2002 at 14:54 UTC
first, i would change the data structure to: `$tagstack{requirements}->[1] = { text => "A node name", attributes => {contactname => "Jane Smith", contactnumber => "555-1212" } }` [download] note that it is not a hash containing a reference to an array which contains hashrefs. also note the curly braces around the attributes hash instead of square braces. if XML::Simple is up to the task of parsing your XML, this is fairly straightforward. here's a little script showing how you would go about it: `#!/usr/bin/perl -wT use strict; use XML::Simple; use Data::Dumper; my $data = XML::Simple::XMLin('./test.xml'); # show what we start with print Data::Dumper::Dumper($data); my %tagstack; my @temp; foreach my $h (@{$data->{requirement}}) { my %t; $t{text} = $h->{content}; delete $h->{content}; $t{attributes} = $h; push @temp, \%t; } $tagstack{requirement} = \@temp; # show the finished product print Data::Dumper::Dumper(\%tagstack);` [download] with test.xml being: `<root> <requirement contactname="Joe Average">A power cord.</requirement> <requirement contactname="Jane Smith" contactnumber="555-1212">A node +name</requirement> </root>` [download] it gives the following output: `$VAR1 = { 'requirement' => [ { 'contactname' => 'Joe Average', 'content' => 'A power cord.' }, { 'contactnumber' => '555-1212', 'contactname' => 'Jane Smith', 'content' => 'A node name' } ] }; $VAR1 = { 'requirement' => [ { 'text' => 'A power cord.', 'attributes' => { 'contactname' => 'Joe + Average' } }, { 'text' => 'A node name', 'attributes' => { 'contactnumber' => '5 +55-1212', 'contactname' => 'Jan +e Smith' } } ] };` [download] personally, i think the data structure that XML::Simple produces is more intuitive, but you've probably got a reason for wanting it in the format you do. anders pearson	[reply] [d/l] [select]
Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 16:32 UTC
Okay, I've looked over the changes you suggested to the data structure. The only thing I wasn't quite clear on was what the `->` arrow operator at the beginning does. I haven't seen it used in that particular way in Perl before. Admittedly I'm still less than 200 pages into the Camel book. Unfortunately though it is more intuitive, XML::Simple isn't quite enough to do what I need to do as it would be more complicated to reassemble the data in the `<method>` structure I provided in another part of this thread than it would be to just stick with XML::Parser. With XML::Parser I can use an `if` or a `case` to fire off different code for an `<object>` or `<input>` element so that I can apply formatting (that's all the object and input tags are for) without have to reassemble the strings. Thanks!	[reply] [d/l] [select]
(jeffa) Re: Predefining complex data structures? by jeffa (Bishop) on Jul 12, 2002 at 14:54 UTC
I can't think of a single good reason to do this, as the data structure returned by XML::Simple should work for just about any need you have: `$VAR1 = { 'requirement' => [ { 'contactname' => 'Joe Average', 'content' => 'A power cord.' }, { 'contactnumber' => '555-1212', 'contactname' => 'Jane Smith', 'content' => 'A node name' } ] };` [download] But, since you asked, how about this: `use strict; use XML::Simple; my $data = do {local $/;<DATA>}; my $xml = XMLin($data,forcearray=>1); my $new; for my $req (@{$xml->{requirement}}) { my %temp; $temp{text} = delete $req->{content}; $temp{attributes} = [%$req]; push @{$new->{requirements}}, {%temp}; } __DATA__ <xml> <requirement contactname="Joe Average">A power cord.</requirement> <requirement contactname="Jane Smith" contactnumber="555-1212">A node +name</requirement> </xml>` [download] This produced the following data structure for me: `$VAR1 = { 'requirements' => [ { 'text' => 'A power cord.', 'attributes' => [ 'contactname', 'Joe Average' ] }, { 'text' => 'A node name', 'attributes' => [ 'contactnumber', '555-1212', 'contactname', 'Jane Smith' ] } ] };` [download] UPDATE: Woah! Sorry Ionizor, i read your post and parsed XML::Parser as XML::Simple. Forgive me, that's what i get for trying to answer questions in the morning without the prerequisite cup 'o joe first. :) But yes, XML::Simple will parse that XML snippet you provided: `$VAR1 = { 'method' => [ { 'object' => [ 'Properties', 'Do not use option Foo', 'Server Name', 'OK' ], 'content' => [ 'Open up the ', ' page. Then uncheck the ', ' checkbox. Under ', ' enter ', ' and then hit ' ], 'input' => [ 'www.example.com' ] } ] };` [download] But that probably will not work for you. :( jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
(Ionizor) Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 15:19 UTC
At the moment I'm using XML::Parser because I wasn't sure if XML::Simple would correctly handle things like: `<method>Open up the <object>Properties</object> page. Then uncheck the <object>Do not use option Foo</object> checkbox. Under <object>Server Name</object> enter <input>www.example.com</input> and then hit <object>OK</object></method>` Which I will be processing a little later on in the script.	[reply] [d/l]
Re: Predefining complex data structures? by demerphq (Chancellor) on Jul 12, 2002 at 15:00 UTC
First off you cant "predefine" your data structure in Perl. There is no means by which you can explicitly specify your data structure. Instead Perl provides for easy ways to implicitly define your data structure. As well as interact with it and redefine it on the fly. So how to do some of the things you want to do... my %struct; foreach my $elem (@elements) { # loop over all the elements # check to make sure our hash key has an array we can push onto. $struct{$elem->name}=[] unless $struct{$elem->name}; # now create a new sub hash to push onto the array later my %hash=(text=>$elem->text, attributes=>[]); #initialize it # loop over each attribute in the element foreach my $attrib ($elem->attribs) { # push the elements onto the attributes array push @{$hash{attributes}},$attrib->name,$attrib->value; } # push a reference to the newly created hash on the array stored for + this element type. push @{$struct{$elem->name}},\%hash; } [download] Now of course you will have to figure out how to convert the pseudo methods ive used here into the real thing. Also, iirc XML does not allow for two attributes of the same name in one tag, so instead of using an array to store them just use a hash (unless of order is important). HTH UPDATE The line where I put an array in explicitly is not needed in this scenario, but if we code like `push @{$hash{$key}},'var' unless @{$hash{$key}}>5;` [download] we would, because autovivification doesnt happen in that context. Sorry. And just now in the CB chip pointed out that changing the condition to `push @{$hash{$key}},'var' unless @{$hash{$key}\|\|[]}>5;` [download] would also do the trick, and is probably more elegant, if a touch obfu'd. Thanks chip. Yves / DeMerphq --- Writing a good benchmark isnt as easy as it might look.	[reply] [d/l] [select]
Re: Re: Predefining complex data structures? by chip (Curate) on Jul 12, 2002 at 15:43 UTC
Thanks for the nod, demerphq, but I think I can do it my hack one better: `for ($hash{$key}) { push @$_, 'var' unless @$_ > 5 }` [download] -- Chip Salzenberg, Free-Floating Agent of Chaos	[reply] [d/l]
Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 16:39 UTC
I understand the rest of the snippet but I'm missing the signficance of the 5. Why 5?	[reply]
Re: Predefining complex data structures? by chip (Curate) on Jul 13, 2002 at 05:51 UTC
Re: Re: Predefining complex data structures? by demerphq (Chancellor) on Jul 15, 2002 at 08:04 UTC
Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 15, 2002 at 17:12 UTC
Re: Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 15:24 UTC
Heh... I put square brackets instead of braces in my example. I did mean for the attributes to be in a hash rather than an array. Oops!	[reply]
Re: Predefining complex data structures? by djantzen (Priest) on Jul 12, 2002 at 14:50 UTC
Why do you feel that this needs to be predefined? Or a better question is, how would you go about doing it given that the structure in its nature is open ended? That is to say, the array reference pointed to by the "requirements" key has no built-in limit, nor does there appear to be a maximum number of attributes in the embedded hash. Thus, if you were to predefine it in some way, you'd have to choose an arbitrary depth to which to do so. As far as `push()`ing and `shift()`ing and what not is concerned, this should do it: `push(@{$tagstack{requirements}}, { text => 'foo', attributes => [ 'wha +tever' ] } );` [download]	[reply] [d/l] [select]
Re: Predefining complex data structures? by Ionizor (Pilgrim) on Jul 12, 2002 at 15:00 UTC
What I had meant by predefining is just to predefine the structure, not the data itself. If I know the structure of the data, defining an appropriate perl structure to hold it shouldn't be that hard. I was having difficulty with the push because it kept trying to push into a hash (d'oh!) and I didn't know the correct syntax to fix it. Thanks!	[reply]
(dkubb) Re: (2) XML parsing and SAX event handlers by dkubb (Deacon) on Jul 13, 2002 at 10:13 UTC
Many of the approaches in this thread centered around using XML::Simple. Why not try using XML::SAX and build your own SAX event handler. I believe it can satify your requirements while at the same time providing more flexibility than XML::Parser's interface. A good introduction to creating SAX event handlers can be found at XML::SAX::Intro in the XML::SAX distribution on CPAN. To address you're question here's a working example: #!/usr/bin/perl -wT use strict; use XML::SAX; use Data::Dumper qw(DumperX); my $handler = My::SAXParser->new; my $parser = XML::SAX::ParserFactory->parser(Handler => $handler); #pass the XML document at the bottom __DATA__ tag to the parser $parser->parse_string(do { local $/; <DATA> }); print DumperX($handler->nodes); { #this class keeps track of the processed nodes package My::SAXParser; use strict; use base qw(XML::SAX::Base); use Class::MethodMaker get_set => ['nodes'], list => ['element_stack']; use constant SKIP_NODE => 'xml'; sub start_document { shift->nodes({}) } sub start_element { my $self = shift; my $el = shift; return if $el->{Name} eq SKIP_NODE; #make note of which element we are processing - in the stack $self->element_stack_push(\my %element); foreach my $attribute (values %{$el->{Attributes}}) { push @{$element{attributes}}, @$attribute{qw(Name Value)}; } #keep track of all interesting element nodes push @{ $self->nodes->{$el->{Name}} }, \%element; return $self->SUPER::start_element($el); } sub characters { my $self = shift; return unless $self->element_stack_count; #are there any pending +element nodes to process? return $self->SUPER::characters($self->element_stack->[-1]->{text} + .= shift->{Data}); } sub end_element { my $self = shift; $self->element_stack_pop; #element has been processed, pop it off + the stack return $self->SUPER::end_element(shift); } } __DATA__ <xml> <requirement contactname="Joe Average">A power cord.</requirement> <requirement contactname="Jane Smith" contactnumber="555-1212">A node +name</requirement> </xml> [download] This should produce the following output: `$VAR1 = { 'requirement' => [ { 'text' => 'A power cord.', 'attributes' => [ 'contactname', 'Joe Average' ] }, { 'text' => 'A node name', 'attributes' => [ 'contactnumber', '555-1212', 'contactname', 'Jane Smith' ] } ] };` [download] I tested this code with the other XML document example you posted in this thread. It can parse it and I believe it produces a pretty reasonable output. Also if performance is an issue it's possible to gain further speed increases using XML::LibXML::SAX::Parser or XML::SAX::Expat. Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code: the use and new constructor statements.	[reply] [d/l] [select]
Re: (4) XML parsing and SAX event handlers by grantm (Parson) on Jul 15, 2002 at 08:16 UTC
Either of these modules can pretty much just be dropped into the above script by modifying two lines of the script's code Actually, it shouldn't be necessary to modify the code at all. Your sample code uses XML::SAX::ParserFactory which will use the system default SAX parser (as defined in lib/XML/SAX/ParserDetails.ini). So if you install XML::SAX::Expat, your script will immediately make use of it.	[reply]
Re: XML parsing and SAX event handlers by Ionizor (Pilgrim) on Aug 02, 2002 at 20:13 UTC
I found the SAX documentation rather confusing the first time I read it over so I put it down for a while. Now I've picked it back up and with a little help from O'Reilly's Perl and XML I'm recoding into XML::SAX. On a related note, I highly recommend O'Reilly's Safari service. Online books! It's very cool.	[reply]