the_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am a beginner with regexp, and are trying to substitute specific occurrances of a value inside a reccurring tag (<rstate>), which can be found in different places and depthts in the file, hence the different if-statements below. The match statements correctly finds what I want, problem is how to substitute the $1 match with a new value. How can I do that? I've googled a lot, but haven't really been able to find the information I need (or didn't understand it). If you have an other way of achieving the result I need, I'd be happy to learn about it. My only concern is that the code should be readable.

if ($data =~ m/<section1>.*?<parameter>.*?<rstate>(.*?)<\/rstate>/s) { print "found first match! '$1'\n"; #$data=~ s/<section1>.*?<parameter>.*?<rstate>(.*?)<\/rstate>/$new +rstate/s; $data=~ s/$rstate/$newrstate/s; } if ($data =~ m/<section2>.*?<subsection>.*?<parameter>.*?<rstate>(.*?) +<\/rstate>/s) { print "found second match! '$1'\n"; $data=~ s/$rstate/$newrstate/s; } if ($data =~ m/<section2>.*?<subsection>.*?<subsection>.*?<parameter>. +*?<rstate>(.*?)<\/rstate>/s) { print "found third match! '$1'\n"; $data=~ s/$rstate/$newrstate/; } # Print to output file print OUT $data;
The data file looks like this;
<map> <section1> <parameter>Some text</parameter> <rstate>CHANGE_THIS</rstate> </section1> <section2> <subsection> <parameter>dont change this</parameter> <rstate>CHANGE_THIS</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>CHANGE_THIS</rstate> </subsection> </section2> <section3> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </section3> </map>
thanks, Pontus

Replies are listed 'Best First'.
Re: regex: Need help substituting
by toolic (Bishop) on Apr 07, 2012 at 13:25 UTC
    If you have an other way of achieving the result I need, I'd be happy to learn about it.
    In general, using an XML parser is preferable to using regular expressions. All parsers require an investment in time to learn, and XML::Twig is a good choice:
    use warnings; use strict; use XML::Twig; my $x = '<map> <section1> <parameter>Some text</parameter> <rstate>CHANGE_THIS</rstate> </section1> <section2> <subsection> <parameter>dont change this</parameter> <rstate>CHANGE_THIS</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>CHANGE_THIS</rstate> </subsection> </section2> <section3> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </section3> </map> '; my $t = XML::Twig->new( twig_handlers => { rstate => \&rstate }, pretty_print => 'indented', ); $t->parse($x); $t->print(); sub rstate { my ($t, $rstate) = @_; my $text = $rstate->text(); $text =~ s/CHANGE_THIS/CHANGED/; $rstate->set_text($text); } __END__ <map> <section1> <parameter>Some text</parameter> <rstate>CHANGED</rstate> </section1> <section2> <subsection> <parameter>dont change this</parameter> <rstate>CHANGED</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </subsection> <subsection> <parameter>dont change this</parameter> <rstate>CHANGED</rstate> </subsection> </section2> <section3> <parameter>dont change this</parameter> <rstate>DONT CHANGE THIS</rstate> </section3> </map>
      Thanks for your reply! That would work using the parser, given all values need to be changed are the same? If they differ I would need several instances of the parser? If I'm not mistaken the parser approach builds around the idea that I already know the value, which is not my case. I need to read the old value, do some calculations and reinsert the new value. I realize my example was not optimal, sorry for that. In reality all values are not neccecarily the same. I still would like to know how this would be achieved using regexp. Thanks again, Pontus

        You can do whatever calculations you like in the XML::Twig tag handler. Trying to do this with regexen is like hammering in a screw---you can do it but it's neither the easiest nor the most adequate way, so just don't.

        I realize my example was not optimal
        OK, so show us a better example of what you are trying to do, what you have tried, and how it doesn't work for you.
Re: regex: Need help substituting
by aaron_baugher (Curate) on Apr 07, 2012 at 14:32 UTC

    As toolic said, use an XML parser. But, in general terms, when I want to use a regex to change an unknown chunk of text that sits between two known chunks of text, I do it backwards from what you tried to do here. Instead of capturing the part I want to replace, I capture the parts I want to keep, and keep them. For instance:

    $string = 'blah blah blah <some_sort_of_tag>CHANGE ME</some_sort_of_ta +g> blah blah blah'; $newtext = 'I AM NEW'; $string =~ s|^(.*<some_sort_of_tag>).+(</some_sort_of_tag>.*)$|$1$newt +ext$2|;

    Aaron B.
    My Woefully Neglected Blog, where I occasionally mention Perl.

      I have understood that using regex for this problem is not the preferred method, at all. It was however my intention to learn more about regexp, so for now, with your help I have come up with (using the backwards strategy)

      # First match ! $data =~ s|^(.*<section1>.*<rstate>).+(</rstate>.*</section1>.*)$| +$1$newrstate$2|s; # Second match! $data =~ s|^(.*<section2>.*<rstate>).+(</rstate>.*</subsection>.*< +/subsection>.*</subsection>.*</section2>.*)$|$1$newrstate$2|s; # Third match! $data =~ s|^(.*<section3>.*<rstate>).+(</rstate>.*</section3>.*)$| +$1$newrstate$2|s;

      Curious as I am, I will read up on the XML parser and try it as well to see how it compares. Thanks for the help!

        I'm just curious as to why you're using the many if statements.
        I suppose the following should work:

        $data=~ s{<rstate>[^<\n]+</rstate>}{<rstate>$new_text</rstate>}g
        Note the use of the 'g' modifier.
        If you want to learn more about regular expressions in general, then Friedl's book is a must. It's one the most well-written books I've ever read on any subject and I highly recommend it.