in reply to Text to XML

As you do not give an example of the original text and of the result you would expect for it, it is very difficult to answer you.

A few ideas though:

And finally, because no post of mine is complete without a shameless XML::Twig plug, if all you want is wrap the lines in a list in the appropriate tags, then you can use something like this:

#!/usr/bin/perl -w use strict; use XML::Twig; XML::Twig->new( # process just list elements twig_roots => { list => \&process_list }, # output the rest as is twig_print_outside_roots => 1, ) ->parse( \*DATA); sub process_list { my( $t, $list)= @_; # wrap (non-empty) lines in a listitem element my @listitems= $list->split( qr/(^.+)\n/m => 'listitem'); # add the p and extract tags within each listitem foreach my $listitem (@listitems) { $listitem->insert( 'p', 'extract'); } $list->print ; } __DATA__ <extract> <line> <p> <extract> <show> <list> first item second item third item </list> </show> </extract> </p> </line> </extract>

Replies are listed 'Best First'.
Re: Re: Text to XML
by murugu (Curate) on Apr 12, 2004 at 15:34 UTC

    Thanks a lot for ur reply

    Here below is the input file:

    <nl>number list 1 number list 2 <ul>unnumbered list1 unnumbered list 2 <pl>plain list 1 plain list 2 <nl>numbered list 1 numbered list 2</nl> </pl> </ul> </nl>

    The output should be like:

    <list type="numbered"> <list-item>numbered list 1</listitem> <list-item>numbered list 1</listitem> <listitem> <list type="unnumbered"> <listitem>unnumbered list1</listitem> <listitem>unnumbered list 2</listitem> <listitem> <list type="plain"> <listitem>plain list 1</listitem> <listitem>plain list 2</listitem> <listitem> <list type="numbered"> <list-item>numbered list 1</listitem> <list-item>numbered list 1</listitem> </list> </listitem> </list> </listitem> </list> </listitem> </list>

    The above output is what i need. I have given u just a small part of my text file. since covering other parts are more similar to this part, please give me suggestions to do the conversion

    for this kind of strucutures what should i do?

    --Murugesan

      First a complaint: if you really want help, you have to help me: please check your data before posting. I nearly gave up on writing the code, because your input and your output are full of typos: the numbers in the list items don't match, the tag names are inconsistent, so is the indenting... this made it really difficult for me to get the test code to work. So please, next time be more considerate.

      With this out of my system ;--( here is the code:

      #!/usr/bin/perl -w use strict; use XML::Twig; use Test::More tests => 1; my( $input, $expected); { local $/="\n\n"; $input= <DATA>; ($expected= <DATA>)=~ s{^\s*}{}; } my $t= XML::Twig->new( twig_handlers => { nl => sub { process_list( numbered => @_) +; }, pl => sub { process_list( plain => @_); }, ul => sub { process_list( unnumbered => @_); }, }, pretty_print => + 'indented', ) ->parse( $input); $t->set_indent( ' ' x 4); # if you really want 4 space indents my $result= $t->sprint; is( $result, $expected, "test lists"); sub process_list { my( $type, $t, $list)= @_; $list->set_tag( 'list') ->set_att( type => $type); foreach my $child ( $list->children) { if( $child->is_text) { $child->mark( qr/^(.+?)\s*$/m, 'listitem'); } else { $child->wrap_in( 'listitem'); } } # you need this for the pretty printing to work, or the # empty text elements left by mark will mess up XML::Twig # this is a bug, I will see how best to fix it in the next version foreach my $child ( $list->children) { $child->delete if( $child->text=~ m{^\s*$}); } } __DATA__ <nl>number list 1 number list 2 <ul>unnumbered list1 unnumbered list 2 <pl>plain list 1 plain list 2 <nl>numbered list 1 numbered list 2</nl> </pl> </ul> </nl> <list type="numbered"> <listitem>number list 1</listitem> <listitem>number list 2</listitem> <listitem> <list type="unnumbered"> <listitem>unnumbered list1</listitem> <listitem>unnumbered list 2</listitem> <listitem> <list type="plain"> <listitem>plain list 1</listitem> <listitem>plain list 2</listitem> <listitem> <list type="numbered"> <listitem>numbered list 1</listitem> <listitem>numbered list 2</listitem> </list> </listitem> </list> </listitem> </list> </listitem> </list>