in reply to Some portion of the text missing

I don't see an "missing text". What do you mean by that?

Perhaps what the regex is doing is surprising you?

To debug something like this, capture and print the other terms in the regex.

#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; print "$3\n\n"; } }
$1 is: <TITLE><![CDATA[<p>Dogs may not smarter than 6-year-olds, but researchers suggest canines might be on par with 2-year-olds.< Psychologist Stanley Coren says, "We do know that dogs understand far more than we credit them with, from about 165 words to 250 words." Eve +n better than understanding our words, dogs know our hand gestures and body postures. Dogs may, in fact, far exceed 2-year-olds when it comes to reading emotions.<BODY> $2 is: <![CDATA[<p>Developmentally, 2-year-olds are generally more interested in themselves, while dogs do care how their people feel, and instantly recognize a change in emotion.< "While your dog can't comprehend that you just received a traffic violation, he can tell that you're upset t +he second you walk through the door," Coren says. "In fact, dogs can dete +ct some subtle changes which even adults can't," adds Coren. "We can't smell cancer or predict seizures, as dogs can."< When I posted this story on my Facebook Fan page recently (<a href=" http://www.new.facebook.com/pages/ Steve-Dale/50057343596?ref=ts"> $3 is: www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebook search), I received some interesting responses:< Kelle: "Heck, my Italian Greyhound is smarter than most college students."< Karen: "Depends on how you define smart.
What I called $3 is what you called $1. Remember that default for regex'es is "greedy", meaning that an expression will match the maximal length thing while still allowing the rest of the regex to match. So these <.+> terms mean to match as much stuff as possible between angle brackets. The second of these terms gets the last pair of angle bracket stuff (update:while still allowing first term to match), first term gets all angle bracket stuff preceding that and 3rd term in regex gets what is left after 2nd term.

Update: Try:

#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; } }
You are still going to get the same result for the (.*) term. What I called $3 above.

Basically every char of text is "accounted for", nothing is "missing". We know what you called $1 matches. What are you trying to match?

Another Update with a minimal match example:

The below regex uses the ? modifier to say: match the shortest thing possible between the angle brackets. Which are the first two angle bracket things in your DATA. $3 would be everything else following.

#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+?>)(<.+?>)(.*)/) { print "$1\n\n"; #prints <TITLE> print "$2\n\n"; #prints <![CDATA[<p> } }

Replies are listed 'Best First'.
Re^2: Some portion of the text missing
by Anonymous Monk on Oct 15, 2009 at 05:53 UTC
    This code works fine if($_ =~ /<.+><.+>(.*)/){ unless <a some characters> exists in the text portion. How to check if character  '>' is present in the portion of <TITLE. or <BODY> and then use the code
    if($_ =~ /(<.+?>)(<.+?>)(.*)/)
    This rule breaks if <code>'>' is not present in the text.
      I am still not quite "getting it" as far as what you want to do. The only information that I have available is what you have given me, which is ONE test case and by the way a lot longer than it needed to be. Its not appropriate to tell me: hey, this works in a lot of test cases that I haven't shown you.

      Let's concentrate on the question at hand. I think you should be telling me exactly what you want in terms of output! I can only answer questions based on the info that I have!

      What I am supposing is that you want to get the <TITLE> and the <BODY>. The following code does that.

      #!/usr/bin/perl -w use strict; while (<DATA>) { if ( my ($title, $body) = ($_ =~ /<TITLE>.+?<p>(.+?)<BODY>.*?<p>(.*)/)[0,1] ) { print "<TITLE>\n$title\n\n", "<BODY>\n$body\n"; } } __END__ Prints:(I did re-format lines to 72 chars in my editor). <TITLE> Dogs may not smarter than 6-year-olds, but researchers suggest canines might be on par with 2-year-olds.< Psychologist Stanley Coren says, "W +e do know that dogs understand far more than we credit them with, from about 165 words to 250 words." Even better than understanding our word +s, dogs know our hand gestures and body postures. Dogs may, in fact, far exceed 2-year-olds when it comes to reading emotions. <BODY> Developmentally, 2-year-olds are generally more interested in themselves, while dogs do care how their people feel, and instantly recognize a change in emotion.< "While your dog can't comprehend that you just received a traffic violation, he can tell that you're upset t +he second you walk through the door," Coren says. "In fact, dogs can dete +ct some subtle changes which even adults can't," adds Coren. "We can't smell cancer or predict seizures, as dogs can."< When I posted this story on my Facebook Fan page recently (<a href=" http://www.new.facebook.com/pages/ Steve-Dale/50057343596?ref=ts"> www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebook search), I received some interesting responses:< Kelle: "Heck, my Italian Greyhound is smarter than most college students."< Karen: "Depends on how you define smart.