Mordan has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I am looking to see if something is possible in Perl. I have managed to use Perl to get POS tags for a series of files thanks to help from the monastery.

I am now looking to do the same but with phrases rather than just the words. I want to scan text files and put the count of PCFG (context free grammar) tags into a CSV file. I have had a look around and the Stanford parser seems popular but I can't seem to get it to do this, although perhaps as it is Java I'm not used to that. If not the NLTK would suit or if MarpaXS is the best tool to use here or if there is a more suitable Perl program.

Example of PCFG tags I'd be looking to output are: ADJP ADVP CONJP FRAG INTJ LST NAC NP NX PP PRN PRT QP RRC S SBAR SBARQ SINV SQ UCP VP WHADJP WHADVP WHNP WHPP

The only Perl example I can find here is this on the Penn Treebank.

Thanks

  • Comment on Parsing context free grammar tags into CSV

Replies are listed 'Best First'.
Re: Parsing context free grammar tags into CSV
by SuicideJunkie (Vicar) on Feb 10, 2014 at 17:23 UTC

    Are you hoping for some sort of flag to give the parser that will make it output a CSV? That doesn't seem likely.

    Seems to me it should be almost trivial to just let the parser do its thing, and then count and print the number of result nodes yourself.

    my %typeCounts; $typeCounts{$_->{type}}++ for (@parsedOutputTokens); print $csvFileHandle "$_, $typeCounts{$_}\n" for keys %typeCounts;