Deib has asked for the wisdom of the Perl Monks concerning the following question:

Good evening monks!

I'm somewhat stuck. Since I can't decipher perldoc even though I read it over and over again I need your help once again.

I'm designing a web-search kind of application. Here's how it works, it first receives a word, then accesses a file full of URLS, takes them into an array, and it searches the URLS to see how many times the word appears in each page, then uploads the data into a database. Everything works, except one thing, I can't figure out how to keep the count of how many times the word appears on EACH URL. I suppose that the solution can be found in HTML-Parser, but I can't get it.

Thanks for answering.

Replies are listed 'Best First'.
Re: HTML-Parser usage
by Ovid (Cardinal) on Mar 17, 2005 at 02:21 UTC

    What exactly are you asking? Are you saying that, given a URL, you already know how to fetch the page, count the words, but you can't figure out how to separate that count from the count of the other pages? That seems so trivial that I must be misunderstanding what you've written, even though that's what you wrote. Can you post some code to clarify what you're really asking? It seems like what do you want to is really simple:

    use HTML::TokeParser::Simple 3.13; my %words; foreach my $url (@urls) { $words{$url} = 0; my $parser = HTML::TokeParser::Simple->new(url => $url); while (my $token = $parser->get_token) { next unless $token->is_text; # assumes you only search visible tex +t $words{$url} += some_word_counting_function($token->as_is); } } # %words now has the count of words per url

    Admittedly, that uses a different module, but that shows how easy it is to track word count per url. Did I misunderstand what you were asking?

    Cheers,
    Ovid

    New address of my CGI Course.

      It works perfectly, thanks a lot Ovid! Yo saved me again :)
      It seems I got more scolding than answering on this thread :s
      I'll make sure not to post such vague and "codeless" questions in the future >_<
Re: HTML-Parser usage
by tlm (Prior) on Mar 17, 2005 at 01:50 UTC

    HTML::Parser works through callbacks that are triggered by various parsing events, so I'd imagine that one such callback (text_h ?) would be in charge of updating the count, but without more specifc details (and some source code to look at) it is hard for me to give you anything more concrete.

    the lowliest monk

Re: HTML-Parser usage
by punkish (Priest) on Mar 17, 2005 at 02:17 UTC
    With absolutely no code, not even pseudo code to go on, it is hard to say, but I assume by...
    how to keep the count of how many times the word appears on EACH URL

    You really mean each PAGE referenced by each URL, no?

    I mean, what is the problem with

    my $wordcount; loop_over_each_url {; search_through_each_page_content {; $wordcount++ (everytime_word_match); } }
    --

    when small people start casting long shadows, it is time to go to bed
Re: HTML-Parser usage
by saintmike (Vicar) on Mar 17, 2005 at 02:16 UTC
    Since you said "Everything works", what have you got so far? Did you write it yourself or just cut-and-paste it?

    It's easier for us to point you in the right direction if you show what you've accomplished so far and what exactly it is that you need help with.