First a complaint: if you really want help, you have to help me: please check your data before posting. I nearly gave up on writing the code, because your input and your output are full of typos: the numbers in the list items don't match, the tag names are inconsistent, so is the indenting... this made it really difficult for me to get the test code to work. So please, next time be more considerate.

With this out of my system ;--( here is the code:

#!/usr/bin/perl -w use strict; use XML::Twig; use Test::More tests => 1; my( $input, $expected); { local $/="\n\n"; $input= <DATA>; ($expected= <DATA>)=~ s{^\s*}{}; } my $t= XML::Twig->new( twig_handlers => { nl => sub { process_list( numbered => @_) +; }, pl => sub { process_list( plain => @_); }, ul => sub { process_list( unnumbered => @_); }, }, pretty_print => + 'indented', ) ->parse( $input); $t->set_indent( ' ' x 4); # if you really want 4 space indents my $result= $t->sprint; is( $result, $expected, "test lists"); sub process_list { my( $type, $t, $list)= @_; $list->set_tag( 'list') ->set_att( type => $type); foreach my $child ( $list->children) { if( $child->is_text) { $child->mark( qr/^(.+?)\s*$/m, 'listitem'); } else { $child->wrap_in( 'listitem'); } } # you need this for the pretty printing to work, or the # empty text elements left by mark will mess up XML::Twig # this is a bug, I will see how best to fix it in the next version foreach my $child ( $list->children) { $child->delete if( $child->text=~ m{^\s*$}); } } __DATA__ <nl>number list 1 number list 2 <ul>unnumbered list1 unnumbered list 2 <pl>plain list 1 plain list 2 <nl>numbered list 1 numbered list 2</nl> </pl> </ul> </nl> <list type="numbered"> <listitem>number list 1</listitem> <listitem>number list 2</listitem> <listitem> <list type="unnumbered"> <listitem>unnumbered list1</listitem> <listitem>unnumbered list 2</listitem> <listitem> <list type="plain"> <listitem>plain list 1</listitem> <listitem>plain list 2</listitem> <listitem> <list type="numbered"> <listitem>numbered list 1</listitem> <listitem>numbered list 2</listitem> </list> </listitem> </list> </listitem> </list> </listitem> </list>

In reply to Re: Re: Re: Text to XML by mirod
in thread Text to XML by murugu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.