texuser74 has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

This is my present script:

while(<DATA>) { s!<a>(.*?),(.*?)</a>!<a>$1A$2</a>!g; s!<b>(.*?),(.*?)</b>!<b>$1B$2</b>!g; print; } __DATA__ This is to test. <a>a, b</a> <b>a,b</b>
and the output is:
This is to test. <a>a, b</a> <b>aBb</b>
Here the present scripts works only within ... since there is no line breaks in between them.

How to find the text with in a text range containing line breaks?

please advise, Thanks in advance.

Replies are listed 'Best First'.
Re: trying to do a simple search
by ikegami (Patriarch) on Jan 22, 2009 at 06:22 UTC
    "." doesn't match newlines unless the "s" modifier is used.
Re: trying to do a simple search
by davido (Cardinal) on Jan 22, 2009 at 06:55 UTC

    In addition to the /s modifier that ikegami mentioned, your script could never match across multiple lines since you're only reading in and working on one line at a time. Unless you set your input record separator ($/, documented in perlvar) to slurp mode, chunk mode, or some alternate record separator, your script will only read in one line at a time.


    Dave

      Thanks davido and ikegami,

      I tried something like this:

      while(<DATA>) { {local $/='<a>'; s!<a>(.*?),(.*?)</a>!<a>$1A$2</a>!gs;} {local $/='<b>'; s!<b>(.*?),(.*?)</b>!<b>$1B$2</b>!sg;} print; } __DATA__ This is to test. <a>a, b</a> <b>a,b</b> <b>a,b </b>
      but still the output is wrong.

      did i miss anything else? thanks in advance.

        It's probably not such a wonderful idea to change the input record separator in the middle of the loop, twice. You're confusing yourself doing that. Plus the changes are falling out of scope before the next iteration anyway.

        Here's what you're reading into $_ on each iteration: First iteration: $_ = <a>a,\n. Then partway into the iteration, you change the input record separator, but it doesn't matter because you've already read your first line of data. Then a little further into the loop you change the input record separator again, and again it makes no difference since you already read the line of data. Then your localization of $/ falls out of scope, and the input record separator reverts back to "\n".

        Now you read in b</a>\n. And so on.


        Dave

        You are still reading your data one line at a time so your pattern matches are matching against one line at a time.

        To see what is being read, you might experiment with the following program:

        #!/usr/bin/perl -w use strict; use warnings; while (<DATA>) { print "start of loop\n"; print "\$_ = \"$_\"\n"; } __DATA__ <a>a, b</a> <b>a,b</b> <b>a,b </b>

        To make <DATA> read all your DATA instead of just one line, you can set $/ before your while loop, as follows. Note the use of a block ({ }) and local so that $/ isn't affected elsewhere in the program.

        #!/usr/bin/perl -w use strict; use warnings; { local $/; while (<DATA>) { print "start of loop\n"; print "\$_ = \"$_\"\n"; } } __DATA__ <a>a, b</a> <b>a,b</b> <b>a,b </b>

        With $/ set to undef (which is what local $/ does), there is no need for the while loop - all the data is read on the first iteration. You can read all the data into a single variable as follows:

        #!/usr/bin/perl -w use strict; use warnings; my $var = do { local $/; <DATA> }; print "\$var = \"$var\""; __DATA__ <a>a, b</a> <b>a,b</b> <b>a,b </b>

        Then you can substitute against this variable as follows:

        $var =~ s!<a>(.*?),(.*?)</a>!<a>$1A$2</a>!gs;

        When you get that working, you might want to try with the following data:

        __DATA__ This is to test. <a>a, b</a> <b>a,b</b> <b>a,b </b> <a>ab</a> <c>a,b</c> <a>a,b</a>