I have decided to polish the code I posted previously and turned it into a command-line utility (joke?). You could call it from your .profile or run from the command-line. Could be useful for a quick answer sometimes, or just ask a silly question for fun. :-)

And yes I could use Text::AutoFormat to format the plain text, but I was too lazy to modify my (already working) code.

#!/usr/local/bin/perl -w use strict; use WWW::Mechanize; use Getopt::Long; use Data::Dumper; # Parse command line arguments and assign corresponding variables GetOptions ( 'q|question=s' => \( my $QUESTION = undef), 'f|format=s' => \( my $FORMAT = 'TEXT' ), 'v|verbose' => \( my $VERBOSE = 0 ), ); unless ( defined $QUESTION && $FORMAT =~ /^(text|html)$/i ) { print <<USAGE Description: Ask the MIT AI (START) a question and get some answer. Just for fun of course. ;-) Usage: $0 [option] Options: -q|--question ["text"] Question to ask AI -f|--format [TEXT|HTML] The output format. Default is TEXT. -v|--verbose Print more info USAGE ; exit(1); } my $URL = "http://www.ai.mit.edu/projects/infolab/"; $FORMAT = uc $FORMAT; print ">> Asking START the question:\n" . ">> $QUESTION\n" if $VERBOSE; my $robot = new WWW::Mechanize; print ">> Fetching query form...\n" if $VERBOSE; $robot->get($URL); print ">> Submitting query...\n" if $VERBOSE; $robot->form_number('1'); $robot->set_fields('query' => $QUESTION); # ask a question $robot->click(); # Get the reply to my question print ">> Fetching answer...\n" if $VERBOSE; my $html = $robot->content(); # Extract the answer my ($text) = $html =~ /(<H1>START(?:.|\n)*(?:<HR>|line-rain.gif" width=100% height=3>))/m +; if (!defined $text || $text =~ /^\s+$/) { $text = NoAnswer(); } if ($FORMAT eq 'TEXT') { # Reformat the text $text =~ s/(<\/P>|<br>|<li>|<option>)/\n\n/gi; # Add some \n's $text =~ s/<[^>]*>//g; # Strip HTML tags $text =~ s/<!--|-->//mg; # Strip comments $text =~ s/&gt;/>/g; # Quick and dirty fix $text =~ s/&lt;/</g; $text =~ s/&nbsp;/ /g; $text =~ s/&amp;/&/g; $text =~ s/&eacute;/e/g; $text =~ s/^[^\S\n]+//mg; # Strip leading spaces $text =~ s/(?<=\n)\n+/\n/mg; # Squash multiple empty lines $text =~ s/(?<!\n)\n(?!\n)/ /mg; # Combine lines $text =~ y/\t / /s; # Squash multiple spaces & tab +s # after some playing around, I came up with the # following regex that does wrapping at column # 60 perfectly. I love perl. ;-) $text =~ s/(.{50,60}(?<=\s\b))/$1\n/mg; } print "$text\n"; exit(0); sub NoAnswer { my @responses = ( 'is silent', 'looks puzzled', 'refuses to give an answer', 'shakes his head', 'gives no answer', 'could not understand the question', 'says: please try again', 'is currently off-line', 'Ur?', 'Which question?', 'Why?', 'Can you repeat the question again?', 'May I have your name please?', 'I am just a robot, what do you expect?', 'Please ask a different question', ); print Dumper(\@responses) if $VERBOSE; my $r = $responses[rand($#responses+1)]; $r = 'START ' . $r if $r =~ /^[a-z]/; $r =~ s/([^\?])$/$1./; return $r; }
Update: I have fixed pattern match, also the usage text: use double quotes instead of single quotes to quote the question. Thanks zentara to point out the bug. ;-)

Update: New version is now available in my follow-up post that does pretty much the same thing, but uses CPAN modules to do the text extraction and formatting.

Replies are listed 'Best First'.
Re: Asking START (MIT AI) a question
by zentara (Cardinal) on Dec 02, 2003 at 18:11 UTC
    I'm getting an error when running this:

    Use of uninitialized value in pattern match (m//) at ./You-can-call-me-AI line 55.

    I entered "You-can-call-me-AI -q 'how far is the moon?'

    I get the correct answer when I go to the site and enter it manually. START is pretty cool.

    If I print the $html right away, I get the answer but something in that "extract text from html regex isn't working".

    my $html = $robot->content(); print "$html\n";
      zentara,
      If you change line 55:
      #my ($text) = $html =~ /(<H1>START(?:.|\n)*<HR>)/mg; my $text = $html;
      You get results. I may update with a fix more in line with Roger's intentions if he doesn't first.

      Cheers - L~R

      Hi zentara, thanks for pointing out the bug. I have fixed the text extraction. ;-)

      I am getting the correct response now -
      P:\Perl>perl robot.pl -q "A" START's reply ===> A I think you're going to have to run that by me again (maybe phrased a bit differently). P:\Perl>perl robot.pl -q "How far is the moon?" START's reply ===> How far is the moon? Moon Distance from Earth (km) 384,467 Source: Planetary Sciences at the National Space Science Data Center P:\Perl>perl robot.pl -q "How big is the moon?" START's reply ===> How big is the moon? Moon The volumetric mean radius of the Moon is 1737.1 (km). Source: Planetary Sciences at the National Space Science Data Center
      I have updated the robot, added additional text filtering to translate meta characters and combine multiple empty lines.

      The following is what START thought about Perl. ;-)



Re: Asking START (MIT AI) a question
by Roger (Parson) on Dec 03, 2003 at 04:21 UTC
    Ok, I am finally fed up with converting HTML to text and format plain text by hand. Let's use some modules. I have rewritten the robot with additional CPAN modules HTML::Strip and Text::Autoformat. Here's the new version...