in reply to Re: Regex problem while parsing tagged, hierarchical data
in thread Regex problem while parsing tagged, hierarchical data

More like some bizarre SGML variant than XML. Valid XML can't omit closing tags; the sample data does that everywhere. But yes, this is a job for a real parser not regexen.

  • Comment on Re^2: Regex problem while parsing tagged, hierarchical data

Replies are listed 'Best First'.
Re^3: Regex problem while parsing tagged, hierarchical data
by hgolden (Pilgrim) on Sep 12, 2006 at 14:55 UTC
    Yeah, they're right that you're probably best off with a parser. However, if you really wanted to stick with regex, you could do the job more slowly and less elegantly with a line-by-line solution like this (make sure to improve before actual use):
    while (<INPUT>) { if (m#<level1 id=\"([^"]*)\"#) { $id=$1; print; } elsif (m#(\s*)<level2>#) { print "$1<level2 id=\"$id\">\n"; } else { print; } }
    Your regex doesn't work because you don't capture the id from the level1 tag when you get to the 2nd level2 tag below that tag. This works by capturing the last level1 id into a persistent variable, and replacing level2 tags when it finds them.

    Hays