in reply to Re^4: How to output the words that you want that came thru an html file?
in thread How to output the words that you want that came thru an html file?
As Ea says below, HTML::TokeParser is a much better choice for robust processing.
Having said that - to answer your questions: (I'm assuming that your lines 20,21 are these)
In order to format the text better, you need to collect it into a scalar. Instead of "print", collect it using:m|^\s*<[^/>]+>(.+)</| and $_=$1; # Zap tags on both sides, if any # The line above looks for text enclosed in html tokens, and extract +s the text. # Eg: applying the regex to : "<h2>Some text</h2>" places "Some tex +t" into "$1", which is then copied into "$_" s|<[^>]+>||g; # Zap single </onetag> tags # The line above handles left-over single tags: # Eg: it zaps "<sometag/>" from "text1 <sometag/> text2" # Actually, it is rather crude, and does not care about tag terminat +ion, or matching.
Of course, you should declare $collected_text outside the loop.$collected_text .= $_;
I hope life isn't a big joke, because I don't get it.
-SNL
|
|---|