in reply to Regex problem while parsing tagged, hierarchical data

It kinda looks like your parsing XML, so why not use an XML parser, such as XML::Simple? That way you are presented with a Perl data structure you can traverse rather than trying to craft a regex which is quite possibly slower and has far more potential for errors.

jdtoronto

  • Comment on Re: Regex problem while parsing tagged, hierarchical data

Replies are listed 'Best First'.
Re^2: Regex problem while parsing tagged, hierarchical data
by Fletch (Bishop) on Sep 12, 2006 at 14:43 UTC

    More like some bizarre SGML variant than XML. Valid XML can't omit closing tags; the sample data does that everywhere. But yes, this is a job for a real parser not regexen.

      Yeah, they're right that you're probably best off with a parser. However, if you really wanted to stick with regex, you could do the job more slowly and less elegantly with a line-by-line solution like this (make sure to improve before actual use):
      while (<INPUT>) { if (m#<level1 id=\"([^"]*)\"#) { $id=$1; print; } elsif (m#(\s*)<level2>#) { print "$1<level2 id=\"$id\">\n"; } else { print; } }
      Your regex doesn't work because you don't capture the id from the level1 tag when you get to the 2nd level2 tag below that tag. This works by capturing the last level1 id into a persistent variable, and replacing level2 tags when it finds them.

      Hays