spx2 has asked for the wisdom of the Perl Monks concerning the following question:

I'm searching for something that will assess the grammatical correctness of a paragraph of english text. So far , I found Lingua::LinkParser and this, and although they use the same parser ( link-grammar ) they are showing different results on the same piece of text.

My idea was to just count the number of linkages Lingua::LinkParser gives for a piece of text and if it's 0 interpret that as "grammatically incorrect". Are there any alternatives on CPAN to this package and if so,what are they and have you used them ? What conclusions have you reached ?

  • Comment on Grammatical Correctness module/package/tool

Replies are listed 'Best First'.
Re: Grammatical Correctness module/package/tool
by Old_Gray_Bear (Bishop) on Apr 04, 2009 at 16:06 UTC
    Whilst a mechanism to check the correctness of grammar or spelling is a laudable goal, you must remember that these are naught but mechanical tools that can not be perfectly relied upon.

    The preceding sentence contains two (or possibly four, all the votes aren't tabulated yet) 'grammatical' solecisms. These 'errors' are (were) valid in the mid-18th century. And in fact some of the spelling, while correct for the modern day, would be considered at least 'eccentric' if not downright erroneous ('naught' versus 'nought', for example) in that period.

    Language and Grammar are mutable things, growing and changing with time. You have to specify "when-ness" to have a valid checker. In addition, Grammar, Spelling, and Usage each havw the property of "context". What is proper grammar in one environment, say a school-yard ("Yo Bro, Wassup?") is inappropriate in an other, say Work ("Good Morning, what broke last night?). The spelling and grammar-checkers that I use flag 'non-standard American-Business' usage. They sputter and choke on some of the faintly archaic idioms that I routinely use. Every time I change Word Processors, I have to re-educate the Dictionary, or put up with Bleeding Copy. I have become sanguine about it.

    Parsing, in the general case is an 'easy' problem (it can be automated), it is not a 'simple' problem (the automation is difficult to get inarguably correct). And even when you do, there are enough legitimate variations in the Linguistic Context to confound your code. Add to that the fact that Language is a motile target and you have a recipe for constant scope-creep.

    In addition consider the 'I spell-checked it, what do you mean their are spelling errors?!' problem.....

    ----
    I Go Back to Sleep, Now.

    OGB

      See also Language Log's take. Basically, even popular industry-standard implementations of grammar checking are, in reality, worse than useless.

      Out of interest, what were your "two (or possibly four) grammatical solecisms" in your first sentence? The only possibility I can spot is the singular/plural mismatch of "a mechanism" vs. "these are", and even that's stretching a point. But maybe I've been reading too much 18th-century literature. :)