I need help with some logic.

shaba has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
(ZZamboni) Re: I need help with some logic by ZZamboni (Curate) on Jul 11, 2000 at 22:13 UTC
I'm reposting my response, since the question had been originally posted to Perl Monks Discussion I'm not a seasoned user of HTML::Parser, but I believe it calls a function for each opening and closing tag it encounters, and for each piece of text between tags. If that's the case, you can set special flags when you encounter certain opening tags, and then store all the text in a variable until you encounter the corresponding closing tag, at which point you can store the text wherever you want. Using the HTML::Parser version 2 subclassing, something like this: (untested code, based on sample code from the HTML::Parser documentation) { package MyParser; use base 'HTML::Parser'; sub start { my($self, $tagname, $attr, $attrseq, $origtext) = @_; if($tagname eq 'blockquote') { $capturing{blockquote}=1; $text{blockquote}=""; } } sub end { my($self, $tagname, $origtext) = @_; $capturing{blockquote}=0 if $tagname eq 'blockquote'; # Do whatever you want to do with $text{blockquote} } sub text { my($self, $origtext, $is_cdata) = @_; $text{blockquote}.=$origtext if $capturing{blockquote}; } } my $p = MyParser->new; $p->parse_file("foo.html"); [download] This will capture all the text between BLOCKQUOTE tags. Of course, you can do more complex rules for capturing what you want and storing it where you want it, but the general idea should be the same. --ZZamboni	[reply] [d/l]
Re: I need help with some logic. by Adam (Vicar) on Jul 11, 2000 at 20:23 UTC
Why can you only read one line at a time? HTML has no internal line breaks (\n is meaningless in an HTML file except in <pre> blocks ) Why not read the whole file first and parse it that way? It wouldn't take much then to find the plain text. Easy ways to read the whole file include: `@file = <FILEHANDLE>; # or: { local $/ = undef; $file = <FILEHANDLE>; }` [download]	[reply] [d/l]
RE: Re: I need help with some logic. by Anonymous Monk on Jul 11, 2000 at 22:15 UTC
The reason is that i need to parse out the html tags in the file and print them to browser. The way i am doing this is as follows: take in a line, see if there is a starting tag (like `<body> or <html>`) and send it to $start, check if there is an ending tag (like `</table>, </html>` etc.) and send that to $end and finally see if there is plain text and send that to $text. This works just peachy for the printing out just the html tags to the browser, but now i am left with several $text(s) and i don't know how to keep blocks of text together. Shaheeb	[reply] [d/l] [select]
RE: RE: Re: I need help with some logic. by Adam (Vicar) on Jul 11, 2000 at 22:29 UTC
I didn't ask why you need to parse the html. I asked why you can't read the whole file. But I see what the issue is now, You are worried about font tags and line breaks breaking up your plain text. My advice is either forget all the breaks and `join ' ', @lines` or use a state machine to keep track of where you are as you parse.	[reply]