That doesn't print the CDATA tags themselves, but it prints everything inside the CDATA tags, including other tags. To do that, the main loop has to process all "tokens" (all tags and all intervening text in the whole document) one token at a time, and a state variable has to keep track of when you're inside a cdata section as opposed to not being inside one.#!/usr/bin/perl use strict; use HTML::TokeParser; my $sample_HTML = <<EOD; <HTML> blah. <CDATA> Just some random whatever. It might have some <b>real</b> HTML like a +table or CSS styling or even some <H1>IMPORTANT</H1> words. Maybe even a form <form method= +post>...</form> </CDATA> </HTML> EOD my $p = HTML::TokeParser->new( \$sample_HTML ); my $in_cdata = 0; while ( my $token = $p->get_token ) { my ( $tkn_type, $tkn_content, @rest ) = @$token; if ( $tkn_type =~ /[SE]/ ) { $tkn_content = pop @rest; # last array element is full tag st +ring } print $tkn_content if ( $in_cdata and $tkn_content !~ /cdata/ ); if ( $tkn_content =~ /cdata/i ) { $in_cdata += ( $tkn_type eq 'S' ) ? 1 : -1; } }
In reply to Re: HTML::TokeParser Frustration
by graff
in thread HTML::TokeParser Frustration
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |