nglenn has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use the following code to change some xml:

$xml = '<funcCall> <funcName>write</funcName> <rhsValue>crlf</rhsValue> <symConstant>loading</symConstant> <symConstant>| |</symConstant> <symConstant>lemmas</symConstant> <symConstant>| |</symConstant> <symConstant>for</symConstant> <symConstant>| |</symConstant> <variable>&lt;lx&gt;</variable> <symConstant>| |</symConstant> <symConstant>onto</symConstant> <symConstant>| |</symConstant> <variable>&lt;lms&gt;</variable> </funcCall>'; $xml =~ s& <funcCall>. <funcName>write</funcName>. <rhsValue>crlf</rhsValue> (.<(?:symConstant|variable)>[^<]*</(?:symConstant|variable)>)+ +. </funcCall> & my $text; foreach my $expr (1..$#-) { $text .= ${$expr}; } $text =~ s#<rhsValue>crlf</rhsValue>#\\n#; $text =~ s#</?symConstant>##sg; $text =~ s#</?variable>##sg; $text =~ s#.\| \|.# #sg; "<write string=\"$text\"/>"; &xse; print $xml;

I want "<write string="loading lemmas for <lx> onto <lms>"/>" but am getting "<write string=" <lms>"/>".

I don't understand why, but the @- only has one element. I know the match is fine because when I print $& I can see that the ()+ matched several items. Any suggestions?

Replies are listed 'Best First'.
Re: regex replace using position loop
by wind (Priest) on Mar 25, 2011 at 20:01 UTC

    If you're working with XML, then you should consider using a cpan module like XML::Simple instead of hacking together a regex.

    use Data::Dumper; use XML::Simple; use strict; my $xml = '<funcCall> <funcName>write</funcName> <rhsValue>crlf</rhsValue> <symConstant>loading</symConstant> <symConstant>| |</symConstant> <symConstant>lemmas</symConstant> <symConstant>| |</symConstant> <symConstant>for</symConstant> <symConstant>| |</symConstant> <variable>&lt;lx&gt;</variable> <symConstant>| |</symConstant> <symConstant>onto</symConstant> <symConstant>| |</symConstant> <variable>&lt;lms&gt;</variable> </funcCall>'; my $ref = XMLin($xml); print Dumper($ref);
Re: regex replace using position loop
by Eliya (Vicar) on Mar 25, 2011 at 19:49 UTC
    I don't understand why, but the @- only has one element

    That's the way capturing works if you have the "+" outside of the capturing parens (i.e. only the last match is captured).  Try putting the parens around the entire subpattern:

    ((?:.<(?:symConstant|variable)>[^<]*</(?:symConstant|variable)>)+)

    This way you also don't need the foreach loop assembling $text — just set $text = $1

Re: regex replace using position loop
by choroba (Cardinal) on Mar 25, 2011 at 20:03 UTC
    Using XML::XSH2:
    open 895570.xml ; my $fullout ; for funcCall/*[name()!='funcName'] { if (name()='rhsValue' and .='crlf') { $out = "" ; } elsif (name()='symConstant' and .='| |') { $out = " " ; } else { $out = (.) ; } ; $fullout .= $out ; } echo :s '<' (funcCall/funcName) ' string="' $fullout '"/>' ;