wolis has asked for the wisdom of the Perl Monks concerning the following question:
I'm working on a little game using Perl to parse simple english phrases like 'create a small white mouse' and an object will be added to a database being a 'mouse' which has attributes of 'size=small' and 'colour=white'.
As much as I have the basics working and am enjoying solving this on my own, I thought I might see what others have done or think on this topic.
Has anyone done any work on parsing things like this?
or can anyone point me in the direction of some relevent text on this subject?
Thanks
___ /\__\ "What is the world coming to?" \/__/ www.wolispace.com
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing english
by allolex (Curate) on Oct 07, 2003 at 09:18 UTC | |
Unless you want your command entry to quickly become the main focus of your game, you might consider using a simplified grammar and lexicon that shoots for about 90% interpretation accuracy at about 70% precision. Just make sure you know what your verbs (V), nouns (N), and adjectives (A) are. So, as far as syntax is concerned, with English you have an advantage for commands. Commands start with verbs and have zero or more arguments having something to do with the verb's action. (I imagine you are *doing* stuff in your game, so I wouldn't bother accounting for stative verbs.) Your command boils down to:
create(mouse)
So you have the combinations V(N) and N(A,A)... and there you have your objects and their properties. You might also want some stemming, so the Lingua:: modules on CPAN (Lingua::Stem::En) should be of some help. Each verb will have to have its arguments defined. Your example of "create" can have one argument type, which is whatever you are creating. You will also have to check to make sure the thing you are creating is creatable, i.e. your lexicon will have to know which actions apply. You might also need to account for movement. Luckily, movement is well-studied and very well-formalized. You move FROM (LOCATION) VIA (LOCATION) TO (LOCATION). Here, you can use keywords to map your path.
You might want to look at http://citeseer.nj.nec.com/ and search for "parsing english" (query results). You'll find lots of academic articles... somewhere there's bound to be some introductory material. If that fails, then look for a copy of Natural Language Understanding by James Allen (link) at the University of Rochester. Update: Fixed some sloppy grammar, added more detail.
-- | [reply] |
by wolis (Scribe) on Oct 09, 2003 at 03:00 UTC | |
..looks like I need that PhD afterall :-)
| [reply] [d/l] |
by allolex (Curate) on Oct 09, 2003 at 08:58 UTC | |
Yes, I suppose finding only information like "This routine applies the Porter Stemming Algorithm to its parameters, returning the stemmed words" in the stemmer docs could be a little confusing to someone who is just trying to find out what a stemmer does. What a stemmer does is fairly straightforward. It extracts the stem or root of words, no matter what form they are in. For example, a text like "He gets up, takes his pills, all the while leaving the water running.", a stemmer would return the stems of all the words in the sentence: "he get up take his pill all the while leave water run". Stemmers often make mistakes with ambiguous forms like 'running' in "his running improved his circulation". Here 'running' is a noun, but a stemmer may treat it as a verb and return 'run'. Interesting links: | [reply] |
|
Re: Parsing english
by Roger (Parson) on Oct 07, 2003 at 05:18 UTC | |
You can checkout the Parse::RecDescent module from CPAN if you decided to build it upon a set of strictly defined grammar. And in the second case, the simplest would be to grep for recognised words in the tokenised text. And then act upon the recognised words. Otherwise you would end up with the task of writing a natual language parser with functional dependent grammar... (Good for a PhD thesis perhaps?) | [reply] |
by wolis (Scribe) on Oct 07, 2003 at 07:28 UTC | |
However I do assume the sentences being typed in would match a logical structure so: All these would (and currently do) work in my parser. However I am working within the confines of creating objects so 'sticks are to be created that are made of wood and grouped into a pile' is outside my 'world view' :-) and ignoerd.. or accuratly 'said' to the other players not acted upon like a 'create' command. And yes you guessed it: Will also logically be parsed and 'work' so players will see 'a pile of sicks with a small white rabbit on it. On the rabbit is a small bundle of red sticks of dynamite' etc..
| [reply] [d/l] [select] |
|
Re: Parsing english
by Abigail-II (Bishop) on Oct 07, 2003 at 10:44 UTC | |
If you want to parse English, go get a linguistics degree at a University. However, for a game you can get away with something else. Parsing simple sentences, which are often of the form: Text based games have been around for over three decades, including Collossal Cave, Zork and thousands of muds. For many games, source is available, and mudlibs are available too. Granted, they are typically written in another language than Perl, but that shouldn't be a problem. The algorithms will remain the same, and they'll be much easier to implement in Perl than in C or Fortran. Abigail | [reply] [d/l] [select] |
by EdwardG (Vicar) on Oct 07, 2003 at 11:40 UTC | |
Time flies like an arrow? How about: Fruit flies like a banana sorry | [reply] [d/l] |
by kiat (Vicar) on May 17, 2004 at 02:14 UTC | |
1) Fruit flies the way a banana flies. ('like' as conjunction) 2) Fruit flies like (e.g. the taste of) a banana. ('like' as verb) | [reply] |
|
Re: Parsing english
by ViceRaid (Chaplain) on Oct 07, 2003 at 10:05 UTC | |
As people have said above, you'll make your life a lot easier if you can constrain the complexity of the sentences which you're trying to parse, and perhaps also limit the vocabulary which can be employed by the user. If you can end up with a grammar like (in pseudo-regex-code)
You can parse quite an expressive range of sentences especially if you know what parts of speech (verbs, nouns, adverbs etc.) different tokens are. This is the Parse::RecDescent approach. A deeper more "linguistic" approach might be to use something like a link parser, which is a package that analyses the structure of natural language sentences. There's several available free on the web, although I've never used one with Perl, only with Other Languages. There are lots of other free linguistic resources available on the web which you might find useful, including WordNet, which you can use to look up hyponyms, synonyms and hypernyms .... .... but this is probably all a bit much for making a small mouse ... ViceRaid | [reply] [d/l] |
|
Re: Parsing english
by dragonchild (Archbishop) on Oct 07, 2003 at 14:44 UTC | |
Good MUDs (which was a small minority ten years ago) will strip out words like a, the, and the like, so that parsing is easier. Better MUDs will use those words to help figure things out. So, you could say something like "kill all the floobers with my magic sword" and the MUD will actually set your attack flag to attack all the floobers in the room and will use your magic sword (as opposed to your non-magic sword or your magic spear). But, that command pre-processing is difficult to locate because it does a common activity, but (potentially) requires a ton of information that crosses all the data structures. (The room, the character, the other PCs/NPCs in the room, etc.) (The standard DikuMUD would complain "I see no 'floobers' in this room!" or some such if you tried the second line.) ------ The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. | [reply] [d/l] |
|
Re: Parsing english
by halley (Prior) on Oct 07, 2003 at 14:49 UTC | |
The verb is the only requirement. Real subjects, dobjects and iobjects all follow the basic grammar:
There are quite a few alternative grammars to the main sentence type, but the overall fields are fixed and once determined, all have the same meaning. For instance, it's okay to type the adverb before the verb. The adverb usually describes a different tradeoff but the same basic verb behavior (run quickly) vs (run quietly). I would recommend against supporting multiple adverbs, especially adverbs modifying adverbs (like 'very'). The subject, if specified, must be first and followed by a comma. It's up to the subject to "consent" to the request; they can decide for themselves whether or not to allow the command (floyd, give me the circuit board). The dobject is either singular, or a list of objects, or a string literal. If a list of objects, the word "and" and/or a comma must separate items. Special pseudo-articles such as qw(all some the my) can help a search strategy for multiple objects within a given search domain (put all goo in the box). Lastly, a string literal is used for things like dialogue (say "hello" to floyd). An alternative sentence grammar would assume that if the sentence consists only of a string literal, then the verb is either 'say', 'exclaim' or 'ask' depending on any final punctuation. The overall effect of multiple dobjects is a simple iteration, with the sentence applied once identically to each dobject. Throw exceptions to interrupt the processing if desired. Iobjects are always singular prepositional targets. An alternative sentence syntax allows iobject to precede dobject, but it really swaps them and supplies a default preposition (give floyd the broom) becomes (give the broom to floyd). This is detected while parsing by noting the missing comma/'and' between two noun phrases. There's a lot more to my scheme; as I said I have developed the code but it's not something I can freely share in detail at this time. You're welcome to e-mail for other ideas, though. -- | [reply] [d/l] [select] |
|
Re: Parsing english
by EdwardG (Vicar) on Oct 07, 2003 at 12:19 UTC | |
Really? I'd like to see your code if you're feeling brave enough. I once (maybe ten years ago) tried thinking about algorithms for parsing natural language and very quickly concluded that I wasn't up to the job. As Abigail so forthrightly intimates, I was probably missing a linguistics degree (or PhD more like). Professor Higgins I ain't. | [reply] |
by wolis (Scribe) on Oct 09, 2003 at 01:30 UTC | |
Here is my basic code (very un-commented at present) and not very elagent. It doesnt do anything with the 'attributes' yet.. this will be looking up in the database (finding objects of class=Attribute name={value}) Please dont hold back and rip apart/suggest/improve where applicable.
| [reply] [d/l] [select] |
|
Re: Parsing english
by PetaMem (Priest) on Oct 07, 2003 at 14:26 UTC | |
you look there: Lingua::LinkParser Quite nice for toy systems.
Bye | [reply] |
by erix (Prior) on Oct 23, 2004 at 20:22 UTC | |
Hi fatvamp, I have been looking at regex possibilities with perl that tend a bit in the direction of NLP, and consequently came across your nodes and website. You say of Lingua::LinkParser 'Quite nice for toy systems.' I don't have (much) experience with it yet but it looks very well-done and well-behaving to me when I paste in some test sentences. May I ask how negative/positive/experienced you are about the module, and more especially the underlying LINK Parser, and its quality now? (I realise your post/opinion is a year old) Thanks! | [reply] |
|
Re: Parsing english
by artist (Parson) on Oct 08, 2003 at 04:33 UTC | |
The message conveyed can be put at definite place in the in the scheme of things than if would become more easier to ask questions. Consider the case: Travel Booking:: "I want to go from London to New York."
artist =================================================== Perl is fast.. So I spend more time doing fast things. | [reply] |
|
Re: Parsing english
by toma (Vicar) on Oct 09, 2003 at 05:31 UTC | |
CiteSeer is an excellent site for finding scholarly papers on this sort of thing. Look for papers on ontology. If you limit your problem to a little world, you can do a very nice job and only miss awkward phrases. I would be interested to know how much effort is required to scale up a small world, and whether tools like WordNet can be leveraged to reduce this effort. Don't be discouraged by a lack of reported successes in this area. Much of the work is outside of public view.
It should work perfectly the first time! - toma
| [reply] |