in reply to (ichimunki) Re: Skipping HTML tags with HTML::TokeParser
in thread Skipping HTML tags with HTML::TokeParser

I don't know how to use the "get_token" method in the HTML::TokeParser module to skip the following set of html tags.
<DD> <A NAME="394893"></A><FONT FACE="helvetica, arial, sans-serif" SIZE="-1"><zindex1>changing dates <a href="chview.htm#1052431">1</a>, <a href="chcncpt.htm#1052501">2</a> </zindex1> <DD> <A NAME="394896"></A><FONT FACE="helvetica, arial, sans-serif" SIZE="-1"><zindex2> jump to <a href="chcncpt.htm#1046200">1</a> </zindex2>
I want extract the text between <zindex1> and  </zindex1>, and the link between <a href=...>1</a>

, and also the text between the sub-index <zindex2> and </zindex2> and the link between

<a href=..>1</a> as well. So the ideal result would be like the following:

changing dates chview.htm#1052431 1 chcncpt.htm#1052501 2 jump to chcncpt.htm#1046200 1
But I have not been able to do it with the get_tag() method, I don't know how to skip the
<DD> <A Name...></A><Font fact=...>
tags using the "get_token" method. Any suggestion?

Replies are listed 'Best First'.
(ichi) Re: Skipping HTML tags using the "get_token" method with HTML::TokeParser module
by ichimunki (Priest) on Mar 14, 2002 at 19:14 UTC
    Same as above, only instead of
    while( $token = $p->get_token() ) { if( $token->[0] eq 'S' && $token->[1] eq 'a' ) {
    use something like
    while( $token = $p->get_token() ) { if( $token->[0] eq 'S' && $token->[1] =~ /zindex/ ) {
    Although I've never heard of the zindex# tag for HTML, so I can't say whether HTML::Parser catches it.