HTML HELP

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: HTML HELP by lhoward (Vicar) on May 18, 2000 at 01:37 UTC
the LWP::Simple get function returns a scalar, not an array. To work the way you want it to you should add a split to break apart the get with each line as an array element. `@html=split "\n",get("http://setiathome.ssl.berkeley.edu/stats/team/te +am_2414.html");` [download]	[reply] [d/l]
Re: HTML HELP by chromatic (Archbishop) on May 18, 2000 at 01:55 UTC
Well, it works for me. Of course, there are some strange things afoot. First of all, my copy of LWP::Simple only returns one scalar from its get() method: `my @html=get("http://setiathome.ssl.berkeley.edu/stats/team/team_2414. +html"); print "Number of lines: ", scalar(@html), "\n";` [download] That prints Number of lines: 1. You probably need a split in there to make a real array, or else some other kind of regular expression to pull out your stats: `#!/usr/bin/perl -w use strict; use LWP::Simple; my $line = get("http://setiathome.ssl.berkeley.edu/stats/team/team_241 +4.html"); $line =~ s/\cM//g; $line =~ s/<(br\|p)>/\n\n/ig; $line =~ s/<(?:[^>'"]\|(['"]).?\1)*>//gs; if ($line =~ m/(\d+\)\sHelmet\s\d+\s+([\d.]+\s\w+\s){5})/) { print $1; } else { print "Not found\n"; }` [download] Hardly beautiful, but the results speak for themselves: `4) Helmet 119 1890 hr 58 min 15 hr 53 min 25.8 sec` [download] That said, there are HTML parsing modules available on CPAN, especially ones dealing with tables. I would prefer that than dealing with the regex.	[reply] [d/l] [select]
Re: HTML HELP by orthanc (Monk) on May 19, 2000 at 18:05 UTC
Hi ppl, Thought this was a great little time saver by Anon Monk so I did a bit of modification to my own ends. I thought I should post my code just incase anyone finds it useful. Not the most bulletproof piece of code but it does the job. Provided they don't change the web pages ! Enjoy Orthanc #!/usr/bin/perl use LWP::Simple; $\| = 1; $url = "http://setiathome.ssl.berkeley.edu/cgi-bin/cgi?cmd= +user_stats&email="; $email = "INSERT DEFAULT EMAIL HERE"; $res_okay = 1; $email = $ARGV[0] if(defined $ARGV[0]); $email =~ s/@/%40/g; @html=get("$url$email"); foreach (@html) { $_=~s/\cM//g; $_=~s/<(br\|p)>/\n\n/ig; $_=~s/<(?:[^>'"]\|(['"]).?\1)>//gs } foreach (@html) { chomp($_); m/(.@.)/; $user = "$1"; m/returned: (.)/; $received = "$1"; m/this rank: (\d)/; $peers = "$1"; m/rank out of (\d).is: (\d)/; $rank = "$2/$1"; m/ (\d)\n(\d hr \d* min)\n(\d* hr \d* min \d?\.\d)/; $results = "$1"; $cputime = "$2"; $avgworktime = "$3"; m/(\d?\.\d%)/; $morework = $1; $res_okay = 0 if(/No user with that name was found/); } if($res_okay) { print "Seti stats ($user)\n"; print " Results: $results\n"; print " Tot CPU Time: $cputime\n"; print "Avg Work Time: $avgworktime\n"; print " Last Result: $received\n"; print " Rank: $rank (Peers $peers)\n"; print " \% Position: $morework\n"; } else { print "No details for : $email\n"; } [download]	[reply] [d/l]
Re: HTML HELP by Anonymous Monk on May 18, 2000 at 05:12 UTC
Thanks for the info.. I ran both scripts and still get the same thing. It prints out all the text of the html document and matchs nothing.. weird	[reply]
RE: Re: HTML HELP by chromatic (Archbishop) on May 18, 2000 at 05:22 UTC
Your original prints out the whole thing because, in your match statement, you print the whole of the matching line. Since the whole document is put into one line, the whole thing prints. In my program, I only print out what matches that particular regex, if and only if anything matches. It won't ever print out anything more than what's specified.	[reply]