Ok, I am finally fed up with converting HTML to text and format plain text by hand. Let's use some modules. I have rewritten the robot with additional CPAN modules HTML::Strip and Text::Autoformat. Here's the new version...

#!/usr/local/bin/perl -w use strict; use WWW::Mechanize; use Getopt::Long; use Data::Dumper; use HTML::Strip; use Text::Autoformat qw(autoformat); # Parse command line arguments and assign corresponding variables GetOptions ( 'q|question=s' => \( my $QUESTION = undef), 'f|format=s' => \( my $FORMAT = 'TEXT' ), 'v|verbose' => \( my $VERBOSE = 0 ), ); unless ( defined $QUESTION && $FORMAT =~ /^(text|html)$/i ) { print <<USAGE Description: Ask the MIT AI (START) a question and get some answer. Just for fun of course. ;-) Usage: $0 [option] Options: -q|--question ["text"] Question to ask AI -f|--format [TEXT|HTML] The output format. Default is TEXT. -v|--verbose Print more info USAGE ; exit(1); } my $URL = "http://www.ai.mit.edu/projects/infolab/"; $FORMAT = uc $FORMAT; print ">> Asking START the question:\n" . ">> $QUESTION\n" if $VERBOSE; my $robot = new WWW::Mechanize; print ">> Fetching query form...\n" if $VERBOSE; $robot->get($URL); print ">> Submitting query...\n" if $VERBOSE; $robot->form_number('1'); $robot->set_fields('query' => $QUESTION); # ask a question $robot->click(); # Get the reply to my question print ">> Fetching answer...\n" if $VERBOSE; my $html = $robot->content(); # Extract the answer my ($text) = $html =~ /(<H1>START(?:.|\n)*(?:<HR>|line-rain.gif" width=100% height=3>))/m +; if (!defined $text || $text =~ /^\s+$/) { $text = NoAnswer(); } if ($FORMAT eq 'TEXT') { # Reformat the text my $hs = HTML::Strip->new(); $text = $hs->parse( $text ); $text =~ s/&gt;/>/g; # Quick and dirty fix $text =~ s/&lt;/</g; $text =~ s/&nbsp;/ /g; $text =~ s/&amp;/&/g; $text =~ s/&eacute;/e/g; $text =~ s/^[^\S\n]+//mg; # Strip leading spaces $text =~ s/(?<=\n)\n+/\n/mg; # Squash multiple empty lines $text =~ s/(?<!\n)\n(?!\n)/ /mg; # Combine lines $text =~ y/\t / /s; # Squash multiple spaces & tab +s $text = autoformat($text, { left=>1, right=>60, all=>1 }); } print "$text\n"; exit(0); sub NoAnswer { my @responses = ( 'is silent', 'looks puzzled', 'refuses to give an answer', 'shakes his head', 'gives no answer', 'could not understand the question', 'says: please try again', 'is currently off-line', 'Ur?', 'Which question?', 'Why?', 'Can you repeat the question again?', 'May I have your name please?', 'I am just a robot, what do you expect?', 'Please ask a different question', ); print Dumper(\@responses) if $VERBOSE; my $r = $responses[rand($#responses+1)]; $r = 'START ' . $r if $r =~ /^[a-z]/; $r =~ s/([^\?])$/$1./; return $r; }



In reply to Re: Asking START (MIT AI) a question by Roger
in thread Asking START (MIT AI) a question by Roger

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.