I need help with some logic

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
RE: I need help with some logic by Russ (Deacon) on Jul 11, 2000 at 06:21 UTC
By "blocks of text" I thought about the pieces between <P> and <br> tags. Still not sure what you mean by titles, though... If your HTML uses tables, it would make this a lot easier. Good luck. Russ Brainbench 'Most Valuable Professional' for Perl	[reply]
RE: I need help with some logic by ZZamboni (Curate) on Jul 11, 2000 at 18:07 UTC
First, this should have been posted to Seekers of Perl Wisdom, not here. Second, I'm not a seasoned user of HTML::Parser, but I believe it calls a function for each opening and closing tag it encounters, and for each piece of text between tags. If that's the case, you can set special flags when you encounter certain opening tags, and then store all the text in a variable until you encounter the corresponding closing tag, at which point you can store the text wherever you want. Using the HTML::Parser version 2 subclassing, something like this: (untested code, based on sample code from the HTML::Parser documentation) { package MyParser; use base 'HTML::Parser'; sub start { my($self, $tagname, $attr, $attrseq, $origtext) = @_; if($tagname eq 'blockquote') { $capturing{blockquote}=1; $text{blockquote}=""; } } sub end { my($self, $tagname, $origtext) = @_; $capturing{blockquote}=0 if $tagname eq 'blockquote'; # Do whatever you want to do with $text{blockquote} } sub text { my($self, $origtext, $is_cdata) = @_; $text{blockquote}.=$origtext if $capturing{blockquote}; } } my $p = MyParser->new; $p->parse_file("foo.html"); [download] This will capture all the text between BLOCKQUOTE tags. Of course, you can do more complex rules for capturing what you want and storing it where you want it, but the general idea should be the same. --ZZamboni	[reply] [d/l]
(crazyinsomniac) RE: I need help with some logic by crazyinsomniac (Prior) on Jul 11, 2000 at 12:21 UTC
$MULTILINE_MATCHING $* Set to 1 to do multi-line matching within a string, 0 to tell Perl that it can assume that strings contain a single line, for the purpose of optimizing pattern matches. Pattern matches on strings containing multiple newlines can produce confusing results when ``$'' is 0. Default is 0. (Mnemonic: matches multiple things.) Note that this variable influences the interpretation of only ``^'' and ``$''. A literal newline can be searched for even when $* == 0. $. The current input line number for the last file handle from which you read (or performed a seek or tell on). An explicit close on a filehandle resets the line number. Because ``<>'' never does an explicit close, line numbers increase across ARGV files (but see examples under eof()). Localizing $. has the effect of also localizing Perl's notion of ``the last read filehandle''. (Mnemonic: many programs use ``.'' to mean the current line number.) ______________________________________________ \|_____ё.·ooO--(> cRaZy is co01. <)--Ooo·.ё_____\| ЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇЇ	[reply]