# $Id: CStatiBR.pm,v 1.0 2007/06/12 09:17:36 rpfernandes Exp $ #Copyright (c) 2007 Rodrigo Panchiniak Fernandes. All rights reserved. # #This program is free software; you can redistribute it and/or # modify it under the same terms as Perl itself. =head1 NAME Text::CStatiBR - performs corpora statistical analyses =head1 SYNOPSIS use CText::CStatiBR; &Text::CStatiBR::CSTATIBR(); =head1 DESCRIPTION Text::CStatiBR creates a seven column CSV file output with one line each token per text given as input a corpus that files names follows ' 1 (1). txt', '1 (2). txt', ..., '1 (n).txt' or 1 \(([1-9]|[1-9][0-9]+)\)\.txt Columns stores statistical information: (1) number of word forms in document d; (2) number of tokens in d; (3) Id number of d, ie., n; (4) frequency of term t in d; (5) corpus frequency of t ; (6) document frequency of t (number of documents where t occurs at least once); (7) t, UTF8 latin coded token-string delimited by /[ -@]|[\[-`]|[{-¿]|[ɐ-˩]|[ʹ-�]/ Main output file name is '1 (n + 5).txt' and it is stored in the same directory as the corpus, together with residual files on each input file with .txu and .txv ad hoc extensions. This code was written under CAPES BEX-09323-5 =head2 Methods Example: #!/usr/bin/perl use strict; use Text::CStatiBR; &Text::CStatiBR::CSTATIBR("5"); #5 files are analised. #Main output #file created is #1 (10).txt =over =cut