Some may have seen
eliza ineptly answering questions in the CB. This is the code that drove her. I'm posting it because it's a) amusing, b) randomly inspiring coding-wise b) occasionally very accurate.
When parsing through the responses, it'll try to choose the one which has the most words matching the input phrase. If there's one node that matches the request, it'll return that. There's probably a lot of room for improvement in this; I know that fuzzy matching and "matching concepts" are a big field, and this is a crude routine to determine similarity.
It ignores a lot of little words, and will return the "best fit" match as well as a random suggestion.
As always, I welcome your comments and suggestions, especially when they concern strengthening matching.
Sample output:
C:\>perl perlbot_randomnode_request.pl CGI cookie help
Out of 431 possibilities
I think this url : "<A HREF="/index.pl?node_id=20485&lastnode_id=864">
+cookie usi
ng cgi.pm</a>" may best help you understand CGI cookie help
. You may also try this random hit : "<A HREF="/index.pl?node_id=18572
+&lastnode_
id=864">Answer: How can I use a CGI script to return an image?</a>"
the code :
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Request::Common;
my @junk=qw(what's what i need want how to can do does has the had sho
+uld will anyone help me did for please hey);
my $request="@ARGV";
my $original_request= "@ARGV";
my @monkurls;
my %responses;
my $responsecount;
# strip out crap words.
foreach (@junk) { $request=~s/\b$_\b//ig};
#
# arrayify the request.
# if there's only one word, bail.
#
my @requestwords = split/ /,$request;
if (scalar(@requestwords)==1){print "You expect me to search on that?
+No way. Maybe with some more useful words."; exit}
#
# start the request.
#
my $ua = LWP::UserAgent->new;
my $url = "http://www.perlmonks.com/index.pl?node_id=864;node=$reques
+t";
my $response = $ua->request(GET $url);
#
# if the title is 'not found' then there's no matches.
# if the title matches $request exactly, there's a node with the exa
+ct title.
# return that.
#
if ($response->content!~m/<title>(search results)<\/title>/i) {
if ($response->content=~m/<title>(not found)<\/title>/i) {
print "I couldn't find any nodes about $request. Sorry
+, eh?.";
exit;
}
if ($response->content=~m/<title>($original_request)<\/title>/
+i) {
print "I found but one node that matched your request
+: \"http://www.perlmonks.com/index.pl?node=$request\" may help you un
+derstand $request\n";
exit;
}
}
my @text=split /\n/, $response->content;
while ($text[0] !~ /here's the stuff:/i) {shift @text}
shift @text;
my $thisline = shift @text;
while ($thisline !~ /<FORM METHOD=GET ACTION="http:\/\/search.cpan.org
+\/search">/i){
if ($thisline=~m/(<title>(.*)<\/title>)/i) {
print "$1\n";
}
$thisline =~ s/<LI>//;
if ($thisline !~ />re:/i) {
$countgrep=grep{$thisline =~/$_/i }@requestwords;
if ($countgrep) {
$responsecount++;
$responses{$thisline}=$countgrep;#urls, %{line=>$thisline,
+ confidence=>$countgrep};
}
}
$thisline = shift @text;
}
;
@monkurls = sort {$responses{$a} <=> $responses{$b}} keys %responses;
my $index = rand @monkurls;
my $quip = $monkurls[rand @monkurls];
print "Out of ". scalar@monkurls . " possibilities\n";
print "I think this url : \"$monkurls[-1]\" may best help you understa
+nd @ARGV\n. ";
scalar@monkurls > 1 && print "You may also try this random hit : \"$qu
+ip\"";