# $Id: CStatiBR.pm,v 1.0 2007/06/12 09:17:36 rpfernandes Exp $
#Copyright (c) 2007 Rodrigo Panchiniak Fernandes. All rights reserved.  
# 
#This program is free software; you can redistribute it and/or 
# modify it under the same terms as Perl itself.  
=head1 NAME 
 
Text::CStatiBR - performs corpora statistical analyses 
 
=head1 SYNOPSIS 
 
  use CText::CStatiBR;  
  &Text::CStatiBR::CSTATIBR(); 
 
=head1 DESCRIPTION 
 
Text::CStatiBR creates a seven column CSV file output with one line each 
token per text given as input a corpus that files names follows ' 
    1 (1). txt', '1 (2). txt', ..., '1 (n).txt'  or 
    1 \(([1-9]|[1-9][0-9]+)\)\.txt 
Columns stores statistical information:  
(1) number of word forms in document d;  
(2) number of tokens in d;  
(3) Id number of d, ie., n;  
(4) frequency of term t in d;  
(5) corpus frequency of t ;  
(6) document frequency of t (number of documents where t occurs at least once);  
(7) t, UTF8 latin coded token-string delimited by /[ -@]|[\[-`]|[{-ż]|[&#592;-&#745;]|[&#884;-&#65533;]/ 
 
Main output file name is '1 (n + 5).txt' and it is stored in the same directory as 
the corpus, together with residual files on each input file with .txu and .txv ad hoc extensions.  
 
This code was written under CAPES BEX-09323-5 
 
=head2 Methods 
 
Example:  
 
#!/usr/bin/perl  
use strict;  
use Text::CStatiBR;  
 
&Text::CStatiBR::CSTATIBR("5");     #5 files are analised.  
                                    #Main output 
                                    #file created is  
                                    #1 (10).txt 
 
=over 
=cut