Re: HTML-Parser usage

What exactly are you asking? Are you saying that, given a URL, you already know how to fetch the page, count the words, but you can't figure out how to separate that count from the count of the other pages? That seems so trivial that I must be misunderstanding what you've written, even though that's what you wrote. Can you post some code to clarify what you're really asking? It seems like what do you want to is really simple:

use HTML::TokeParser::Simple 3.13;
my %words;

foreach my $url (@urls) {
  $words{$url} = 0;
  my $parser = HTML::TokeParser::Simple->new(url => $url);
  while (my $token = $parser->get_token) {
    next unless $token->is_text; # assumes you only search visible tex
+t
    $words{$url} += some_word_counting_function($token->as_is);
  }
}
# %words now has the count of words per url
[download]

Admittedly, that uses a different module, but that shows how easy it is to track word count per url. Did I misunderstand what you were asking?

Cheers,
Ovid

New address of my CGI Course.

Comment on Re: HTML-Parser usage Download Code

Replies are listed 'Best First'.
Re^2: HTML-Parser usage by Deib (Sexton) on Mar 30, 2005 at 05:21 UTC
It works perfectly, thanks a lot Ovid! Yo saved me again :) It seems I got more scolding than answering on this thread :s I'll make sure not to post such vague and "codeless" questions in the future >_<	[reply]