Re^3: How to count the vocabulary of an author?

http://qs1969.pair.com?node_id=11133808

in reply to Re^2: How to count the vocabulary of an author?
in thread How to count the vocabulary of an author?

Well, I have a PhD in mathematical linguistics. Stemming was done in the first year ;-)

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Comment on Re^3: How to count the vocabulary of an author? Download Code

Replies are listed 'Best First'.
OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by Bod (Parson) on Jun 14, 2021 at 12:01 UTC
Well, I have a PhD in mathematical linguistics Wow! - Genuinely impressive. Can I ask your opinion on Hemingway Editor? I use it extensively in producing content for our business marketing, blogs, etc. But I have started writing something to perform a similar task but more tailored to our needs. For example, in marketing the ratio of first person to second person pronouns is (thought to be) important. My version makes extensive use of Lingua::EN::Fathom. My attempt is not very far developed and I'd love some informed input before I go much further.	[reply]
Re: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by choroba (Cardinal) on Jun 14, 2021 at 13:42 UTC
In fact, the idea is craftily clever. Their stemmer and parser can only stem and parse simple sentences, so if it can't process the sentence with a sufficiently high certainty, they flag it as too complex :-) I don't know what technology they use in the editor. Also, I quit academia almost ten years ago, so things might have moved a bit since I worked on similar stuff. But generally, English is one of the easier languages to process. Its morphology is simple (almost no declension, simple conjugation) and the training data for statistical methods are huge. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^2: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by LanX (Saint) on Jun 14, 2021 at 15:20 UTC
> But generally, English is one of the easier languages to process. For stemmer! Sure! But lack of grammar makes context and interpretation harder... ... Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by choroba (Cardinal) on Jun 14, 2021 at 15:35 UTC
Re^4: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by erix (Prior) on Jun 14, 2021 at 16:30 UTC
Some notes below your chosen depth have not been shown here
Re^4: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by LanX (Saint) on Jun 14, 2021 at 16:26 UTC
Re^2: OT - Hemingway Editor (was: Re^4: How to count the vocabulary of an author?) by Bod (Parson) on Jun 14, 2021 at 23:39 UTC
I don't know what technology they use in the editor Javascript apparently... There is an explanation here	[reply]

In Section Seekers of Perl Wisdom