fernandes has asked for the wisdom of the Perl Monks concerning the following question:
Thanks!# $Id: CStatiBR.pm,v 1.0 2007/06/12 09:17:36 rpfernandes Exp $ #Copyright (c) 2007 Rodrigo Panchiniak Fernandes. All rights reserved. + # #This program is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. =head1 NAME Text::CStatiBR - performs corpora statistical analyses =head1 SYNOPSIS use CText::CStatiBR; &Text::CStatiBR::CSTATIBR(); =head1 DESCRIPTION Text::CStatiBR creates a seven column CSV file output with one line ea +ch token per text given as input a corpus that files names follows ' 1 (1). txt', '1 (2). txt', ..., '1 (n).txt' or 1 \(([1-9]|[1-9][0-9]+)\)\.txt Columns stores statistical information: (1) number of word forms in document d; (2) number of tokens in d; (3) Id number of d, ie., n; (4) frequency of term t in d; (5) corpus frequency of t ; (6) document frequency of t (number of documents where t occurs at lea +st once); (7) t, UTF8 latin coded token-string delimited by /[ -@]|[\[-`]|[{-¿]| +[ɐ-˩]|[ʹ-�]/ Main output file name is '1 (n + 5).txt' and it is stored in the same +directory as the corpus, together with residual files on each input file with .txu +and .txv ad hoc extensions. This code was written under CAPES BEX-09323-5 =head2 Methods Example: #!/usr/bin/perl use strict; use Text::CStatiBR; &Text::CStatiBR::CSTATIBR("5"); #5 files are analised. #Main output #file created is #1 (10).txt =over =cut
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: help for naming a module that aims latin utf8 coded corpus statistical analysis
by GrandFather (Saint) on Jun 19, 2007 at 22:31 UTC | |
|
Re: help for naming a module that aims latin utf8 coded corpus statistical analysis
by graff (Chancellor) on Jun 20, 2007 at 03:59 UTC | |
by fernandes (Monk) on Jul 18, 2007 at 06:02 UTC |