I was out at a birthday party for a friend of my finacee this evening, so I didn't have much time to work on this.
If you take this code (See code below) and add it to your bot - call the "grabMarkovPhrase" and it will return a string for you to return to the user talking to your bot. That way instead of just random text, it will return text that is more in the style of... well, whatever text that you train it on.
(you could log all of the responses that users give your bots into a file, then you could train this bot against that file - that way it would talk back to them in a manner similar to the way they talk to it... that would likely degrade since there will be a lot of mimicry that it will learn on)
The variables at the top control the minimum and maximum size of the string that it will return (counting "words" - really it just splits on a space and then looks to see how many array spots that creates).
I just real quickly tore this code out of my radiohead song generator stuff and also created just a flat file of all of the radiohead text.
There are a ton of things that could make this far better:
- It will cut off words at the end. I just haven't bothered to address this issue yet in my markov stuff. Judging by the fractured conversations that I've had with people on AIM, I'm not sure it matters too much what this thing does in terms of grammar or "typing".
- It could written much better in terms of looks and especially in terms of performance. All I did was spend 5 mins and glance through your code (enough to thinnk that you could just call this method and return that string to the user) and then yank this stuff out so that it could work in your code.
- One huge performance issue is that each time this is called, it reads in the file and generates the hash. It would obviously be much better to create the hash from a fixed set of text, and then save out that premade hash and then only load that in when run. On my Athlon 1G M laptop running ActiveState Perl, it takes around 1 second to return its string.
- It would be nice if it could look at the user response, and then determine how to respond to that instead of totally randomly. Like if they use words that are indicitive of a question, then respond with a certain range of starting words, etc. But that is beyond what I feel like doing right now - I feel like sleeping, as always.
- There is likely more, but I think I'm off to bed - this was just a fun thing for me to think about and try to quickly bang out - not like I would recommend this in a production environment. The text file isn't anything special - it can be replaced with any text file - I don't strip out that many special characters - so if you change the text, be wary of characters that you don't want showing up - if there are any.
Some examples of running this a few times:
- White-washed me alive we've gotta dig it up on the main
- Immerse I'd telling to shreds. You find another mess
- Gave me a minute the underground. Let down Why don't really since in a jack the best you ca
- Bill My uncle Bill to try Oh, it's OK And why should turn animals Living in the beach wi
- And he's see me a message I can't afford to
- Karma police arrest when I got something My
- A Kid A Kinetic Kinetic Kinetic Kinetic Kinetic Kinetic Kinetic... oh no, pop it died a bag
- IA, the world astral carefully Burning in a prises, I wish I could be and it hard, t
(See code below)
#!/usr/bin/perl -w
my $MAX_STRING_LENGTH = 20;#what is the max number of words you want t
+o return
my $MIN_STRING_LENGTH = 3;#what is the min number of words you want to
+ return
#seems to look "best" (still not terriffic) when the max is 20 and the
+ min is 3
#yee ol global hash
my %dataHash;
###########################
#pass in file name and then put its text in a string - stripping out a
+ll of the newlines
#looks at 'words' of size 4 at a time
###########################
sub createQuadHash{
my $fileName = shift(@_);
my $allText = "";
open(PHRASES, $fileName) or die "Can't open that file: $!\n";
while(<PHRASES>){
$allText .= $_;
}
close(PHRASES);
$allText =~ tr/\n/ /d;#replace the newlines with spaces (just chom
+ping sometimes results in words mashed together)
$allText =~ tr/[]()\"//d;#strip out chars that we don't want (beyo
+nd just newlines)
#now take $allText and split it into an array
@allTextArr = split('', $allText);
#iterate the array and fill a hash, looking at the $_ and the one
+to the right
for(my $x = 0; $x < $#allTextArr; $x++){
#make sure that the $pos2 isn't gonna be out of the array
+
if($x+7 <= $#allTextArr){
$pos1 = $allTextArr[$x].$allTextArr[$x+1].$allTextArr[$x+2
+].$allTextArr[$x+3];
$pos2 = $allTextArr[$x+4].$allTextArr[$x+5].$allTextArr[$x
++6].$allTextArr[$x+7];
$dataHash{$pos1}{$pos2}++;
}
}
return;
}
###########################
###########################
#from Perl Cookbook 2.10
#take in a hash - convert the values to percentages, and then return t
+hat hash
###########################
sub weight_to_dist {
my %weights = @_;
my %dist = ();
my $total = 0;
my ($key, $weight);
local $_;
foreach (values %weights) {
$total += $_;
}
while ( ($key, $weight) = each %weights ) {
$dist{$key} = $weight/$total;
}
return %dist;
}
###########################
###########################
#from Perl Cookbook 2.10
#take in a hash and via a weighted random, pick one of the keys based
+on the value and return it
###########################
sub weighted_rand {
my %dist = @_;
my ($key, $weight);
my $rand;
while (1) { # to avoid floating point inaccura
+cies
$rand = rand;
while ( ($key, $weight) = each %dist ) {
return $key if ($rand -= $weight) < 0;
}
}
}
###########################
###########################
#randomly pick and return a starting letter
###########################
sub startLetter{
my $alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
my $startLetter = substr($alphabet, int(rand(26)), 1);
return $startLetter;
}
###########################
###########################
#randomly pick a key from the songsHash
###########################
sub randomKeyFromFirstHash{
return ((keys %dataHash)[int rand keys %dataHash]);
}
###########################
###########################
#this returns the phrase to show to the screen/user
###########################
sub grabMarkovPhrase {
createQuadHash('radiohead.txt');
my $s = "";
my $start = "";
my $wordCount = 0;
my %tempHash;
my @tempArr;
my $retStr = "";
#randomly pick a phrase length - between 1 and the MAX_STRING_LENG
+TH
my $stoppinTime = int((rand($MAX_STRING_LENGTH-$MIN_STRING_LENGTH)
+ + $MIN_STRING_LENGTH));
#randomly pick a spot in the hash, then check to see if the first
+letter is a cap letter
#if it is, then carry on, if not, pick again
my $fail = 0;
while($fail != 1){
$start = randomKeyFromFirstHash();#will be three chars
#if the first position is a capital letter, then cool, (ord wi
+ll only look at the first)
if(ord($start) > 64 && ord($start) < 91){
$fail = 1;
}
}
#take the start letter and get rolling...
$s = $start;
$retStr .= $s;
#count the letters followed by spaces
while($wordCount < $stoppinTime){
%tempHash = weight_to_dist(%{$dataHash{$s}});
$s = weighted_rand(%tempHash);
$retStr .= $s;
#count the number of words, split on the space
@tempArr = split(' ',$retStr);
$wordCount = scalar(@tempArr);
}
return $retStr;
}
###########################
print grabMarkovPhrase();
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.