Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

use LWP::Simple; @html=get("http://setiathome.ssl.berkeley.edu/stats/team/team_2414 +.html"); foreach (@html) { $_=~s/\cM//g; $_=~s/<(br|p)>/\n\n/ig; $_=~s/<(?:[^>'"]*|(['"]).*?\1)*>//gs } foreach (@html) { chomp($_); if ($_=~/\bHelmet\b/) { print $_; }else { print "no\n"; } }
I am going to use this script to fetch my seti@home stats from
my team page. The problems is I cannot get it to match
my username... I included the parsing snippet about... HELP!!

Replies are listed 'Best First'.
Re: HTML HELP
by lhoward (Vicar) on May 18, 2000 at 01:37 UTC
    the LWP::Simple get function returns a scalar, not an array. To work the way you want it to you should add a split to break apart the get with each line as an array element.
    @html=split "\n",get("http://setiathome.ssl.berkeley.edu/stats/team/te +am_2414.html");
Re: HTML HELP
by chromatic (Archbishop) on May 18, 2000 at 01:55 UTC
    Well, it works for me. Of course, there are some strange things afoot. First of all, my copy of LWP::Simple only returns one scalar from its get() method:
    my @html=get("http://setiathome.ssl.berkeley.edu/stats/team/team_2414. +html"); print "Number of lines: ", scalar(@html), "\n";
    That prints Number of lines: 1. You probably need a split in there to make a real array, or else some other kind of regular expression to pull out your stats:
    #!/usr/bin/perl -w use strict; use LWP::Simple; my $line = get("http://setiathome.ssl.berkeley.edu/stats/team/team_241 +4.html"); $line =~ s/\cM//g; $line =~ s/<(br|p)>/\n\n/ig; $line =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs; if ($line =~ m/(\d+\)\sHelmet\s\d+\s+([\d.]+\s\w+\s){5})/) { print $1; } else { print "Not found\n"; }
    Hardly beautiful, but the results speak for themselves:
    4) Helmet 119 1890 hr 58 min 15 hr 53 min 25.8 sec
    That said, there are HTML parsing modules available on CPAN, especially ones dealing with tables. I would prefer that than dealing with the regex.
Re: HTML HELP
by orthanc (Monk) on May 19, 2000 at 18:05 UTC

    Hi ppl,

    Thought this was a great little time saver by Anon Monk so I did a bit of modification to my own ends. I thought I should post my code just incase anyone finds it useful. Not the most bulletproof piece of code but it does the job.

    Provided they don't change the web pages !

    Enjoy
    Orthanc

    #!/usr/bin/perl use LWP::Simple; $| = 1; $url = "http://setiathome.ssl.berkeley.edu/cgi-bin/cgi?cmd= +user_stats&email="; $email = "INSERT DEFAULT EMAIL HERE"; $res_okay = 1; $email = $ARGV[0] if(defined $ARGV[0]); $email =~ s/@/%40/g; @html=get("$url$email"); foreach (@html) { $_=~s/\cM//g; $_=~s/<(br|p)>/\n\n/ig; $_=~s/<(?:[^>'"]*|(['"]).*?\1)*>//gs } foreach (@html) { chomp($_); m/(.*@.*)/; $user = "$1"; m/returned: (.*)/; $received = "$1"; m/this rank: (\d*)/; $peers = "$1"; m/rank out of (\d*).*is: (\d*)/; $rank = "$2/$1"; m/ (\d*)\n(\d* hr \d* min)\n(\d* hr \d* min \d*?\.\d*)/; $results = "$1"; $cputime = "$2"; $avgworktime = "$3"; m/(\d*?\.\d*%)/; $morework = $1; $res_okay = 0 if(/No user with that name was found/); } if($res_okay) { print "Seti stats ($user)\n"; print " Results: $results\n"; print " Tot CPU Time: $cputime\n"; print "Avg Work Time: $avgworktime\n"; print " Last Result: $received\n"; print " Rank: $rank (Peers $peers)\n"; print " \% Position: $morework\n"; } else { print "No details for : $email\n"; }
Re: HTML HELP
by Anonymous Monk on May 18, 2000 at 05:12 UTC
    Thanks for the info.. I ran both scripts and still get the same thing. It prints out all the text of the html document and matchs nothing.. weird
      Your original prints out the whole thing because, in your match statement, you print the whole of the matching line. Since the whole document is put into one line, the whole thing prints.

      In my program, I only print out what matches that particular regex, if and only if anything matches. It won't ever print out anything more than what's specified.