Your logic goes astray because you try to parse XML with regexps.

You shouldn't.

Please read On XML parsing for a (n incomplete) list of reasons not to.

Actually the problem seems even worse than that: it looks like you are calling XML something that is not XML: level=2 is not an attribute value in XML, you need to quote the value.

I can't really test much right now due to Antonio keeping me busy, but something like this using XML::Twig would certainly work:

#!/usr/perl -w use strict; use XML::Twig; my $t= XML::Twig->new( twig_roots => { heading => \&heading }, twig_print_outside_roots => 1, pretty_print => 'indented' ); $t->parse( \*DATA); sub heading { my( $t, $heading)= @_; if( defined $heading->att( 'level')) { $heading->set_gi( 'heading' . $heading->att( 'level')); $heading->del_att( 'level'); } elsif( my $pcdata= $heading->first_child( '#PCDATA')) { my $text= $pcdata->pcdata; if( $text=~ s/^\s*level\s*=\s*(\d+),//s) { $pcdata->set_pcdata( $text); $heading->set_gi( 'heading' . $1); } } $heading->print; } __DATA__ <doc> <heading> <index/> <index/> level=3, Specifying Rest Arguments in a Procedure Definition</heading> <heading level="2"> Introduction to Arguments</heading> </doc>

And finally, if you really want to use regexps, you can have a look at XML::Regexp or at XML::Parser::Lite.


In reply to Re: Regex for XML attributes... by mirod
in thread Regex for XML attributes... by tshabet

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.