in reply to Re: Text to XML
in thread Text to XML

Thanks a lot for ur reply

Here below is the input file:

<nl>number list 1 number list 2 <ul>unnumbered list1 unnumbered list 2 <pl>plain list 1 plain list 2 <nl>numbered list 1 numbered list 2</nl> </pl> </ul> </nl>

The output should be like:

<list type="numbered"> <list-item>numbered list 1</listitem> <list-item>numbered list 1</listitem> <listitem> <list type="unnumbered"> <listitem>unnumbered list1</listitem> <listitem>unnumbered list 2</listitem> <listitem> <list type="plain"> <listitem>plain list 1</listitem> <listitem>plain list 2</listitem> <listitem> <list type="numbered"> <list-item>numbered list 1</listitem> <list-item>numbered list 1</listitem> </list> </listitem> </list> </listitem> </list> </listitem> </list>

The above output is what i need. I have given u just a small part of my text file. since covering other parts are more similar to this part, please give me suggestions to do the conversion

for this kind of strucutures what should i do?

--Murugesan

Replies are listed 'Best First'.
Re: Re: Re: Text to XML
by mirod (Canon) on Apr 12, 2004 at 16:20 UTC

    First a complaint: if you really want help, you have to help me: please check your data before posting. I nearly gave up on writing the code, because your input and your output are full of typos: the numbers in the list items don't match, the tag names are inconsistent, so is the indenting... this made it really difficult for me to get the test code to work. So please, next time be more considerate.

    With this out of my system ;--( here is the code:

    #!/usr/bin/perl -w use strict; use XML::Twig; use Test::More tests => 1; my( $input, $expected); { local $/="\n\n"; $input= <DATA>; ($expected= <DATA>)=~ s{^\s*}{}; } my $t= XML::Twig->new( twig_handlers => { nl => sub { process_list( numbered => @_) +; }, pl => sub { process_list( plain => @_); }, ul => sub { process_list( unnumbered => @_); }, }, pretty_print => + 'indented', ) ->parse( $input); $t->set_indent( ' ' x 4); # if you really want 4 space indents my $result= $t->sprint; is( $result, $expected, "test lists"); sub process_list { my( $type, $t, $list)= @_; $list->set_tag( 'list') ->set_att( type => $type); foreach my $child ( $list->children) { if( $child->is_text) { $child->mark( qr/^(.+?)\s*$/m, 'listitem'); } else { $child->wrap_in( 'listitem'); } } # you need this for the pretty printing to work, or the # empty text elements left by mark will mess up XML::Twig # this is a bug, I will see how best to fix it in the next version foreach my $child ( $list->children) { $child->delete if( $child->text=~ m{^\s*$}); } } __DATA__ <nl>number list 1 number list 2 <ul>unnumbered list1 unnumbered list 2 <pl>plain list 1 plain list 2 <nl>numbered list 1 numbered list 2</nl> </pl> </ul> </nl> <list type="numbered"> <listitem>number list 1</listitem> <listitem>number list 2</listitem> <listitem> <list type="unnumbered"> <listitem>unnumbered list1</listitem> <listitem>unnumbered list 2</listitem> <listitem> <list type="plain"> <listitem>plain list 1</listitem> <listitem>plain list 2</listitem> <listitem> <list type="numbered"> <listitem>numbered list 1</listitem> <listitem>numbered list 2</listitem> </list> </listitem> </list> </listitem> </list> </listitem> </list>