in reply to Text to XML

Here's a solution with XML::Smart:
use XML::Smart ; my $xml = XML::Smart->new(q` <nl> number list 1 number list 2 <ul> unnumbered list1 unnumbered list 2 <pl> plain list 1 plain list 2 <nl> numbered list 1 numbered list 2 </nl> </pl> </ul> </nl> ` , 'html'); my $new_xml = XML::Smart->new() ; process($xml , $new_xml) ; print $new_xml->data ; sub process { my $xml = shift ; my $new_xml = shift ; foreach my $node_i ( $xml->nodes ) { my @lines = split(/\s*\n\s*/ , $node_i) ; my $type ; if ( $node_i->key eq 'nl' ) { $type = 'numbered' ;} elsif ( $node_i->key eq 'ul' ) { $type = 'unnumbered' ;} elsif ( $node_i->key eq 'pl' ) { $type = 'plain' ;} my $set_root = 1 if $new_xml->base->null ; $new_xml->{list}{type} = $type ; $new_xml = $new_xml->{list} if $set_root ; push( @{$new_xml->{'list-item'}} , @lines) ; process($node_i , $new_xml->{listitem} ) ; } }
The output is:
<?xml version="1.0" encoding="iso-8859-1" ?> <?meta name="GENERATOR" content="XML::Smart/1.5.9 Perl/5.006001 [MSWin +32]" ?> <list type="numbered"> <list-item>number list 1</list-item> <list-item>number list 2</list-item> <listitem> <list type="unnumbered"/> <list-item>unnumbered list1</list-item> <list-item>unnumbered list 2</list-item> <listitem> <list type="plain"/> <list-item>plain list 1</list-item> <list-item>plain list 2</list-item> <listitem> <list type="numbered"/> <list-item>numbered list 1</list-item> <list-item>numbered list 2</list-item> </listitem> </listitem> </listitem> </list>
Note that your XML structure is very strange! If you can make something more normal will be better, since the idea of XML is not to delcare a tree, but to declare a document that can be read by other programs, and a crazy DTD will make this impossible in some languages. Note that if you don't want to share this type of document, maybe XML is not the best choice to store your tree.

Other crazy thing that you have is that tag <list>, where the first, the root, is used as a node with the list-item inside:

<list type="numbered"> <list-item>number list 1</list-item> <list-item>number list 2</list-item> </list>
And in the other parts you use it as a simple tag near the list-item:
<listitem> <list type="numbered"/> <list-item>numbered list 1</list-item> <list-item>numbered list 2</list-item> </listitem>
Also you have 2 different tags with similar names, <listitem> and <list-item>. Will be better to have something different, like <subitem> and <listitem>.

Also I don't understand why have <list> and <listitem> as a new level for items! So, my suggestion for you XML is:

<list type="numbered"> <item>number list 1</item> <item>number list 2</item> <list type="unnumbered"/> <item>unnumbered list 1</item> <item>unnumbered list 2</item> <list type="plain"/> <item>plain list 1</item> <item>plain list 2</item> <list type="numbered"/> <item>numbered list 1</item> <item>numbered list 2</item> </list> </list> </list> </list>
Is smaller and represent a similar tree with the same informations. So, I say again, if you can, please, change this crazy DTD!

Graciliano M. P.
"Creativity is the expression of the liberty".

Replies are listed 'Best First'.
Re: Re: Text to XML
by mirod (Canon) on Apr 13, 2004 at 08:20 UTC

    The XML submitted by the OP is indeed strange, but I think it's just typos. list-item and listitem should be just one, and, like you suggested, I would call it item.

    Once this is fixed, the original XML is perfectly reasonable. In any case I would certainly not call it "crazy". It is standard practice to have a list contain only items, not a mixture of items and lists as you suggest at the end of your post. That's how XHTML, Docbook, and just about any other DTD out there works.

    If anything the XML you propose is harder to handle with most tools. It might be easier to process with XML::Smart, but that's a (minor) gripe I have with both XML::Smart and XML::Simple: they sometimes lead to XML that is designed with the tool in mind, instead of following standard practices and proper XML design.

      XML::Smart and XML::Simple doesn't follow any DTD to read a XML!

      What I say that is crazy, is the use of <list> in 2 ways, that I don't think that can be defined well with a DTD.

      Also you really need to take care with typos. in XML, foo-bar, is very different of foobar, that is different of FOOBAR! Soo, when I saw list-item, and listitem, for me as a XML tag, they are things different, but only similar in the name. So, the structure that I suggest in the end, is based in the same tree structure sent in the main post, where yes, it has a list with items and sub lists inside it, since I won't judge that structure, I'm only judging the use of similar names for tags and use of the same name, <list>, in different ways.

      And don't forget that without "following standard practices and proper XML design." you don't have a real XML, for the real purpose of XML, be a standart format. And without a real XML you just don't need XML, you can use better things to declare a tree.

      Good luck!

      Graciliano M. P.
      "Creativity is the expression of the liberty".

        What I say that is crazy, is the use of <list> in 2 ways, that I don't think that can be defined well with a DTD.

        Uh? What about this:

        <!ELEMENT list (listitem+)> <!ATTLIST list type (numbered|unnumbered|plain) #REQUIRED> <!ELEMENT listitem (#PCDATA|list)*>

        This describes exactly the target XML

        What you proposed would be:

        <!ELEMENT list (listitem|list)+> <!ATTLIST list type (numbered|unnumbered|plain) #REQUIRED> <!ELEMENT listitem (#PCDATA)>

        Once again this is not the usual, and recommended, way of structuring lists.

        And if you read my previous posts, I think I agree with you that the typos need to be fixed.

        Oh, and this debate is probably moot anyway, as Murugesan mentionned a DTD, that seems to be out of his control.