Yeah, they're right that you're probably best off with a parser. However, if you really wanted to stick with regex, you could do the job more slowly and less elegantly with a line-by-line solution like this (make sure to improve before actual use):
while (<INPUT>) { if (m#<level1 id=\"([^"]*)\"#) { $id=$1; print; } elsif (m#(\s*)<level2>#) { print "$1<level2 id=\"$id\">\n"; } else { print; } }
Your regex doesn't work because you don't capture the id from the level1 tag when you get to the 2nd level2 tag below that tag. This works by capturing the last level1 id into a persistent variable, and replacing level2 tags when it finds them.

Hays


In reply to Re^3: Regex problem while parsing tagged, hierarchical data by hgolden
in thread Regex problem while parsing tagged, hierarchical data by sivaramanm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.