Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl while (<DATA>) { if (/(<\w+>)/) { $flag = $1; } if($flag =~ m/<Content>/){ s/\^[A-Z]/&lt;br\/&gt;/g; } else{ s/\^[A-Z]//g; } print; } __DATA__ <ID>^M <Number>124^M</Number>^M <Content> Some of the week's top stories:^M EASTON AREA^M BETHLEHEM AREA^M CARBON, SCHUYLKILL AND REGION^M^M have charged his mother's boyfriend, Paul Hoffman, with third-degree m +urder. Officials said the charges were filed after an exhaustive inve +stigation that showed Miller, who died Sept. 25, had the mental capac +ity of a 12-year-old.^M </Content> ^M ^M ^M ^M ^M ^M </ID>
The code above should substitute ^M with &lt;br\/>&gt; only with in the <Content> portion. and continuous &lt;br\/>&gt; should be replaced with single &lt;br\/>&gt; But the above code is replacing all the ^M after </Content> and before </ID>.

Replies are listed 'Best First'.
Re: substitution issue
by Marshall (Canon) on Nov 02, 2009 at 14:01 UTC
    I find your indenting style confusing as well as problem statement. However, this appears to meet your requirements.

    #!/usr/bin/perl -w use strict; while (<DATA>) { if (m|^\s*\<Content\>| ... m|^\s*\<\/Content\>|) { s/(\^M)+/\^M/g; #compress multiple ^M to one s/(\^M)/&lt;br\/>&gt/g; print $_ unless m|\<.?Content\>|; # don't print <Content> # or <\Content> lines } } =pod THE ABOVE CODE PRINTS: Some of the week's top stories:&lt;br/>&gt EASTON AREA&lt;br/>&gt BETHLEHEM AREA&lt;br/>&gt CARBON, SCHUYLKILL AND REGION&lt;br/>&gt have charged his mother's boyfriend, Paul Hoffman, with third-degree m +urder. Officials said the charges were filed after an exhaustive inve +stigation that showed Miller, who died Sept. 25, had the mental capac +ity of a 12-year-old.&lt;br/>&gt =cut __DATA__ <ID>^M <Number>124^M</Number>^M <Content> Some of the week's top stories:^M EASTON AREA^M BETHLEHEM AREA^M CARBON, SCHUYLKILL AND REGION^M^M have charged his mother's boyfriend, Paul Hoffman, with third-degree m +urder. Officials said the charges were filed after an exhaustive inve +stigation that showed Miller, who died Sept. 25, had the mental capac +ity of a 12-year-old.^M </Content> ^M ^M ^M ^M ^M ^M
Re: substitution issue
by JavaFan (Canon) on Nov 02, 2009 at 11:58 UTC
    $flag is set to a string consisting of \w characters. Then it's matched against /<Content>/. Considering that it's impossible for $flag to contain a < or a > character*, it will never match.

    * Unless you have a very weird locale in effect.

      $flag is set to a string consisting of \w characters.

      No.

Re: substitution issue
by vitoco (Hermit) on Nov 02, 2009 at 12:50 UTC

    Are ^M two chars (ie. ^ and M) or just a visual representation of \r byte (ie. CR)? From the first part of the substitution /\^[A-Z]/ I see you think that are two bytes, but if it's always a M, why are you seeking for the whole alphabet?

    BTW, your code is replacing ^M after </Content> because your flag is not reset by this closing tag.