HTML-Parser usage

Deib has asked for the wisdom of the Perl Monks concerning the following question:

Good evening monks!

I'm somewhat stuck. Since I can't decipher perldoc even though I read it over and over again I need your help once again.

I'm designing a web-search kind of application. Here's how it works, it first receives a word, then accesses a file full of URLS, takes them into an array, and it searches the URLS to see how many times the word appears in each page, then uploads the data into a database. Everything works, except one thing, I can't figure out how to keep the count of how many times the word appears on EACH URL. I suppose that the solution can be found in HTML-Parser, but I can't get it.

Thanks for answering.

Comment on HTML-Parser usage

Replies are listed 'Best First'.
Re: HTML-Parser usage by Ovid (Cardinal) on Mar 17, 2005 at 02:21 UTC
What exactly are you asking? Are you saying that, given a URL, you already know how to fetch the page, count the words, but you can't figure out how to separate that count from the count of the other pages? That seems so trivial that I must be misunderstanding what you've written, even though that's what you wrote. Can you post some code to clarify what you're really asking? It seems like what do you want to is really simple: `use HTML::TokeParser::Simple 3.13; my %words; foreach my $url (@urls) { $words{$url} = 0; my $parser = HTML::TokeParser::Simple->new(url => $url); while (my $token = $parser->get_token) { next unless $token->is_text; # assumes you only search visible tex +t $words{$url} += some_word_counting_function($token->as_is); } } # %words now has the count of words per url` [download] Admittedly, that uses a different module, but that shows how easy it is to track word count per url. Did I misunderstand what you were asking? Cheers, Ovid New address of my CGI Course.	[reply] [d/l]
Re^2: HTML-Parser usage by Deib (Sexton) on Mar 30, 2005 at 05:21 UTC
It works perfectly, thanks a lot Ovid! Yo saved me again :) It seems I got more scolding than answering on this thread :s I'll make sure not to post such vague and "codeless" questions in the future >_<	[reply]
Re: HTML-Parser usage by tlm (Prior) on Mar 17, 2005 at 01:50 UTC
HTML::Parser works through callbacks that are triggered by various parsing events, so I'd imagine that one such callback (`text_h` ?) would be in charge of updating the count, but without more specifc details (and some source code to look at) it is hard for me to give you anything more concrete. the lowliest monk	[reply] [d/l]
Re: HTML-Parser usage by punkish (Priest) on Mar 17, 2005 at 02:17 UTC
With absolutely no code, not even pseudo code to go on, it is hard to say, but I assume by... how to keep the count of how many times the word appears on EACH URL You really mean each PAGE referenced by each URL, no? I mean, what is the problem with `my $wordcount; loop_over_each_url {; search_through_each_page_content {; $wordcount++ (everytime_word_match); } }` [download] -- when small people start casting long shadows, it is time to go to bed	[reply] [d/l]
Re: HTML-Parser usage by saintmike (Vicar) on Mar 17, 2005 at 02:16 UTC
Since you said "Everything works", what have you got so far? Did you write it yourself or just cut-and-paste it? It's easier for us to point you in the right direction if you show what you've accomplished so far and what exactly it is that you need help with.	[reply]