I'm attempting to make a link popularity checker in CGI (most will be tested and programmed for the command line until it's ready-- to remove unneeded hassle) and for something of this magnitude, speed is definitely a concern. www.marketleap.com has a similar thing to what I want to create and it takes like 10 seconds top to parse all the URLs and give you the report, it takes almost 10 seconds for 1 or 2 engines for my script. Can someone give me tips on how to severely increase the speed on this using modules that come with perl?

Code so far:

#!/usr/bin/perl use LWP::Simple; use strict; $|=1; my $url = "http://sulfericacid.perlmonk.org"; my $altavista = "http://www.altavista.com/web/results?q=link:$url&kl=X +X&search=Search"; my $google = "http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1& +q=link%3A$url&btnG=Google+Search"; ######################## # Altavista! ######################## my $altavista_content = get("$altavista"); my @altavista_lines = split /\n/, $altavista_content; my $altavista_results; foreach my $altavista_line (@altavista_lines) { $altavista_results = $1 if $altavista_line =~ m/AltaVista found (.*) r +esults/; } print "searched: $altavista\n"; print "results: $altavista_results\n"; ######################## # Google! ######################## my $google_content = get("$google", 'User-Agent' => 'Mozilla/4.76 [en +] (win-98; U)'); my @google_lines = split /\n/, $google_content; my $google_results; my $hits; foreach my $google_line (@google_lines) { if ($google_line =~ /Results <b>\d+<\/b> - <b>\d+<\/b> of about <b>((\ +d{1,3}\,?)+)<\/b>/g) { $hits = $1; }} #Results <b>1</b> - <b>1</b> of <b>1</b>. print "searched: $google\n"; print "results: $google_results $hits\n";


"Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

sulfericacid

In reply to Speeding up HTML parsing by sulfericacid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.