in reply to Re^4: How to output the words that you want that came thru an html file?
in thread How to output the words that you want that came thru an html file?

I should have indicated in my previous post that the sample code is rather fragile, and very dependent on the way the website developer chooses to store his/her HTML.

As Ea says below, HTML::TokeParser is a much better choice for robust processing.

Having said that - to answer your questions: (I'm assuming that your lines 20,21 are these)

m|^\s*<[^/>]+>(.+)</| and $_=$1; # Zap tags on both sides, if any # The line above looks for text enclosed in html tokens, and extract +s the text. # Eg: applying the regex to : "<h2>Some text</h2>" places "Some tex +t" into "$1", which is then copied into "$_" s|<[^>]+>||g; # Zap single </onetag> tags # The line above handles left-over single tags: # Eg: it zaps "<sometag/>" from "text1 <sometag/> text2" # Actually, it is rather crude, and does not care about tag terminat +ion, or matching.
In order to format the text better, you need to collect it into a scalar. Instead of "print", collect it using:
$collected_text .= $_;
Of course, you should declare $collected_text outside the loop.
Then, after the loop, you will need to parse and clean $collected_text, before printing it.

             I hope life isn't a big joke, because I don't get it.
                   -SNL