This is a difficult question to ask since I'm not sure of the terminology. Basically I am looking for a solution to parse what I would call a "loose" XML grammar. This means that data is contained between nested tags just as XML but without the requirement to specify the sequence of subtags.
I'm a novice with regards to XML but it seems that what I'm looking for a more generalized grammar parser?
For example, this would be allowed:
<toptag>
<subtag1>element #1</subtag1>
<subtag2>element #2</subtag2>
<subtag3>element #3</subtag3>
<toptag>
<toptag>
<subtag1>element #3</subtag1>
<subtag2>element #2</subtag2>
<subtag1>element #1</subtag1>
<toptag>
<toptag>
<subtag2>element #2</subtag2>
<subtag2>element #2</subtag2>
<subtag2>element #2</subtag2>
<toptag>
The trouble is that the subtags could occur in any order and in any number from 0 to unbounded.
Essentially, I want to build a hash of these tag elements and then parse through the hash to build an XML compliant output.
This is kind of out of my area and I'm not sure of that I'm asking the right questions when I research this. Any suggestions would be appreciated.
Further clarification:
Maybe this will help clarify. Consider it this way. A person is writing a text document. They will tag various words or phrases of that document using a predefined set of tags. Different parts of the document may contain related tags. For example,
<statement>
This is the statement of <person id="001"><name>Joe Smith</nam
+e></person>. His mothers name is
<parent><name>Betty</name></parent>. Joe is <person id="001"><
+age>15</age></person> years old.
</statement>
The person {name/age} sub-elements could occur in any order. In fact, the parent/person elements could occur in any order. There might also be multiple person tag sets.
Ultimately, I want to parse the final document, build a hash from the tags and then process the hash to combine all the elements associated with person id="001" into a single data structure.
Update:
I've received several good suggestions and some good advice. XML::Simple seems the most promising at the moment. Of course, I'm open to more suggestions and I'd love to hear from someone who has tackled this problem before.
Well, I've got some exploration to do ...
PJ
use strict; use warnings; use diagnostics;