Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl while (<DATA>) { if($_ =~ /<.+><.+>(.*)/){ $output = $1; print $output; } } __DATA__ <TITLE><![CDATA[<p>Dogs may not smarter than 6-year-olds, but research +ers suggest canines might be on par with 2-year-olds.< Psychologist S +tanley Coren says, "We do know that dogs understand far more than we +credit them with, from about 165 words to 250 words." Even better tha +n understanding our words, dogs know our hand gestures and body postu +res. Dogs may, in fact, far exceed 2-year-olds when it comes to readi +ng emotions.<BODY><![CDATA[<p>Developmentally, 2-year-olds are genera +lly more interested in themselves, while dogs do care how their peopl +e feel, and instantly recognize a change in emotion.< "While your dog + can't comprehend that you just received a traffic violation, he can +tell that you're upset the second you walk through the door," Coren s +ays. "In fact, dogs can detect some subtle changes which even adults +can't," adds Coren. "We can't smell cancer or predict seizures, as do +gs can."< When I posted this story on my Facebook Fan page recently ( +<a href="http://www.new.facebook.com/pages/ Steve-Dale/50057343596?re +f=ts">www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or s +imply type Steve Dale into the Facebook search), I received some inte +resting responses:< Kelle: "Heck, my Italian Greyhound is smarter tha +n most college students."< Karen: "Depends on how you define smart.

Replies are listed 'Best First'.
Re: Some portion of the text missing
by GrandFather (Saint) on Oct 15, 2009 at 07:46 UTC

    Without any explanation or example of what you expect to see, it's a bit hard to be sure just what it is you want. However the usual mantra Don't use regexen to parse markup seems applicable. Instead use a suitable module. HTML::TreeBuilder is a good starting point. Consider:

    use strict; use warnings; use HTML::TreeBuilder; my $html = <<'HTML'; <TITLE><![CDATA[<p>Dogs may not smarter than 6-year-olds, but research +ers suggest canines might be on par with 2-year-olds.< Psychologist Stanle +y Coren says, "We do know that dogs understand far more than we credit them wi +th, from about 165 words to 250 words." Even better than understanding our word +s, dogs know our hand gestures and body postures. Dogs may, in fact, far excee +d 2-year-olds when it comes to reading emotions.<BODY><![CDATA[<p>Develo +pmentally, 2-year-olds are generally more interested in themselves, while dogs do + care how their people feel, and instantly recognize a change in emotion.< "Whil +e your dog can't comprehend that you just received a traffic violation, he can te +ll that you're upset the second you walk through the door," Coren says. "In fa +ct, dogs can detect some subtle changes which even adults can't," adds Coren. " +We can't smell cancer or predict seizures, as dogs can."< When I posted this st +ory on my Facebook Fan page recently (<a href="http://www.new.facebook.com/pages +/ Steve-Dale/50057343596?ref=ts">www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve +Dale into the Facebook search), I received some interesting responses:< Kelle: " +Heck, my Italian Greyhound is smarter than most college students."< Karen: "Dep +ends on how you define smart. HTML my $tree = HTML::TreeBuilder->new; # empty tree $tree->parse ($html); $tree->eof (); for my $element ($tree->content_list()) { print $element->as_text (), "\n\n"; }

    Prints:

    Dogs may not smarter than 6-year-olds, but researchers suggest canines + might be on par with 2-year-olds.< Psychologist Stanley Coren says, +"We do know that dogs understand far more than we credit them with, f +rom about 165 words to 250 words." Even better than understanding our + words, dogs know our hand gestures and body postures. Dogs may, in f +act, far exceed 2-year-olds when it comes to reading emotions. Developmentally, 2-year-olds are generally more interested in themselv +es, while dogs do care how their people feel, and instantly recognize + a change in emotion.< "While your dog can't comprehend that you just + received a traffic violation, he can tell that you're upset the seco +nd you walk through the door," Coren says. "In fact, dogs can detect +some subtle changes which even adults can't," adds Coren. "We can't s +mell cancer or predict seizures, as dogs can."< When I posted this st +ory on my Facebook Fan page recently (www.new.f acebook.com/pages/Ste +ve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebo +ok search), I received some interesting responses:< Kelle: "Heck, my +Italian Greyhound is smarter than most college students."< Karen: "De +pends on how you define smart.

    True laziness is hard work
Re: Some portion of the text missing
by Marshall (Canon) on Oct 15, 2009 at 05:18 UTC
    I don't see an "missing text". What do you mean by that?

    Perhaps what the regex is doing is surprising you?

    To debug something like this, capture and print the other terms in the regex.

    #!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; print "$3\n\n"; } }
    $1 is: <TITLE><![CDATA[<p>Dogs may not smarter than 6-year-olds, but researchers suggest canines might be on par with 2-year-olds.< Psychologist Stanley Coren says, "We do know that dogs understand far more than we credit them with, from about 165 words to 250 words." Eve +n better than understanding our words, dogs know our hand gestures and body postures. Dogs may, in fact, far exceed 2-year-olds when it comes to reading emotions.<BODY> $2 is: <![CDATA[<p>Developmentally, 2-year-olds are generally more interested in themselves, while dogs do care how their people feel, and instantly recognize a change in emotion.< "While your dog can't comprehend that you just received a traffic violation, he can tell that you're upset t +he second you walk through the door," Coren says. "In fact, dogs can dete +ct some subtle changes which even adults can't," adds Coren. "We can't smell cancer or predict seizures, as dogs can."< When I posted this story on my Facebook Fan page recently (<a href=" http://www.new.facebook.com/pages/ Steve-Dale/50057343596?ref=ts"> $3 is: www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebook search), I received some interesting responses:< Kelle: "Heck, my Italian Greyhound is smarter than most college students."< Karen: "Depends on how you define smart.
    What I called $3 is what you called $1. Remember that default for regex'es is "greedy", meaning that an expression will match the maximal length thing while still allowing the rest of the regex to match. So these <.+> terms mean to match as much stuff as possible between angle brackets. The second of these terms gets the last pair of angle bracket stuff (update:while still allowing first term to match), first term gets all angle bracket stuff preceding that and 3rd term in regex gets what is left after 2nd term.

    Update: Try:

    #!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+>)(.*)/) { print "$1\n\n"; print "$2\n\n"; } }
    You are still going to get the same result for the (.*) term. What I called $3 above.

    Basically every char of text is "accounted for", nothing is "missing". We know what you called $1 matches. What are you trying to match?

    Another Update with a minimal match example:

    The below regex uses the ? modifier to say: match the shortest thing possible between the angle brackets. Which are the first two angle bracket things in your DATA. $3 would be everything else following.

    #!/usr/bin/perl -w use strict; while (<DATA>) { if($_ =~ /(<.+?>)(<.+?>)(.*)/) { print "$1\n\n"; #prints <TITLE> print "$2\n\n"; #prints <![CDATA[<p> } }
      This code works fine if($_ =~ /<.+><.+>(.*)/){ unless <a some characters> exists in the text portion. How to check if character  '>' is present in the portion of <TITLE. or <BODY> and then use the code
      if($_ =~ /(<.+?>)(<.+?>)(.*)/)
      This rule breaks if <code>'>' is not present in the text.
        I am still not quite "getting it" as far as what you want to do. The only information that I have available is what you have given me, which is ONE test case and by the way a lot longer than it needed to be. Its not appropriate to tell me: hey, this works in a lot of test cases that I haven't shown you.

        Let's concentrate on the question at hand. I think you should be telling me exactly what you want in terms of output! I can only answer questions based on the info that I have!

        What I am supposing is that you want to get the <TITLE> and the <BODY>. The following code does that.

        #!/usr/bin/perl -w use strict; while (<DATA>) { if ( my ($title, $body) = ($_ =~ /<TITLE>.+?<p>(.+?)<BODY>.*?<p>(.*)/)[0,1] ) { print "<TITLE>\n$title\n\n", "<BODY>\n$body\n"; } } __END__ Prints:(I did re-format lines to 72 chars in my editor). <TITLE> Dogs may not smarter than 6-year-olds, but researchers suggest canines might be on par with 2-year-olds.< Psychologist Stanley Coren says, "W +e do know that dogs understand far more than we credit them with, from about 165 words to 250 words." Even better than understanding our word +s, dogs know our hand gestures and body postures. Dogs may, in fact, far exceed 2-year-olds when it comes to reading emotions. <BODY> Developmentally, 2-year-olds are generally more interested in themselves, while dogs do care how their people feel, and instantly recognize a change in emotion.< "While your dog can't comprehend that you just received a traffic violation, he can tell that you're upset t +he second you walk through the door," Coren says. "In fact, dogs can dete +ct some subtle changes which even adults can't," adds Coren. "We can't smell cancer or predict seizures, as dogs can."< When I posted this story on my Facebook Fan page recently (<a href=" http://www.new.facebook.com/pages/ Steve-Dale/50057343596?ref=ts"> www.new.f acebook.com/pages/Steve-Dale/50057343596?ref=ts, or simply type Steve Dale into the Facebook search), I received some interesting responses:< Kelle: "Heck, my Italian Greyhound is smarter than most college students."< Karen: "Depends on how you define smart.