mboudreau has asked for the wisdom of the Perl Monks concerning the following question:
I have been staring at this problem all day, and my error refuses to get up and wave its arms at me.
I'm writing a perl script to munge an XML DTD (for reasons we needn't go into). One of the script's tasks is to find parameter entity definitions that are empty and remove the references to those entities from any element definitions.
# $entity = the name of the entity (already initialized) # $text = the full DTD (already initialized) # comment out the empty entity definition # e.g., <!ENTITY % foo " " > # so far this section works fine, wrapping the empty entity definition + in comment tags if ( $text =~ /(<!ENTITY\s+%\s+$entity\s+\"\s+\"\s+>)/ ) { print "Commenting out empty entity '$entity'\n"; my $entity_def = $1; $text =~ s/$entity_def/<!--\n$entity_def\n-->/; } # here's the problem: I want to find # <!ELEMENT bar (#PCDATA %foo;)* > # and remove the "%foo;" if ( $text =~ /(<!ELEMENT\s+(\S+)\s+\(.*?$entity.+?\).+?>)/ ) { my $element_def = $1; my $element_name = $2; print "Found empty entity '$entity' in $element_name " . "content model: $element_def\n"; my $new_element_def = $element_def; $new_element_def =~ s/\|?\s*%$entity;//; print "Looking for $element_def\n"; # this NEVER PRINTS--WHY? print "Found it!\n" if $text =~ /$element_def/; }
I always manage to capture the element definition in $element_def, but my final regex never works, and I can't figure out why.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex problem (XML)
by toolic (Bishop) on Jun 10, 2010 at 23:17 UTC | |
by choroba (Cardinal) on Jun 10, 2010 at 23:29 UTC | |
by Jenda (Abbot) on Jun 14, 2010 at 06:50 UTC | |
|
Re: Regex problem
by choroba (Cardinal) on Jun 10, 2010 at 21:50 UTC | |
by mboudreau (Acolyte) on Jun 11, 2010 at 15:24 UTC |