Perhaps what the regex is doing is surprising you?
To debug something like this, capture and print the other terms in the regex.
#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; print "$3\n\n"; } }
What I called $3 is what you called $1. Remember that default for regex'es is "greedy", meaning that an expression will match the maximal length thing while still allowing the rest of the regex to match. So these <.+> terms mean to match as much stuff as possible between angle brackets. The second of these terms gets the last pair of angle bracket stuff (update:while still allowing first term to match), first term gets all angle bracket stuff preceding that and 3rd term in regex gets what is left after 2nd term.$1 is: <TITLE><![CDATA[<p>Dogs may not smarter than 6-year-olds, but researchers suggest canines might be on par with 2-year-olds.< Psychologist Stanley Coren says, "We do know that dogs understand far more than we credit them with, from about 165 words to 250 words." Eve +n better than understanding our words, dogs know our hand gestures and body postures. Dogs may, in fact, far exceed 2-year-olds when it comes to reading emotions.<BODY> $2 is: <![CDATA[<p>Developmentally, 2-year-olds are generally more interested in themselves, while dogs do care how their people feel, and instantly recognize a change in emotion.< "While your dog can't comprehend that you just received a traffic violation, he can tell that you're upset t +he second you walk through the door," Coren says. "In fact, dogs can dete +ct some subtle changes which even adults can't," adds Coren. "We can't smell cancer or predict seizures, as dogs can."< When I posted this story on my Facebook Fan page recently (<a href=" http://www.new.facebook.com/pages/ Steve-Dale/50057343596?ref=ts"> $3 is: www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebook search), I received some interesting responses:< Kelle: "Heck, my Italian Greyhound is smarter than most college students."< Karen: "Depends on how you define smart.
Update: Try:
You are still going to get the same result for the (.*) term. What I called $3 above.#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; } }
Basically every char of text is "accounted for", nothing is "missing". We know what you called $1 matches. What are you trying to match?
Another Update with a minimal match example:
The below regex uses the ? modifier to say: match the shortest thing possible between the angle brackets. Which are the first two angle bracket things in your DATA. $3 would be everything else following.
#!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+?>)(<.+?>)(.*)/) { print "$1\n\n"; #prints <TITLE> print "$2\n\n"; #prints <![CDATA[<p> } }
In reply to Re: Some portion of the text missing
by Marshall
in thread Some portion of the text missing
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |