Dear Bros. on code:
First of all, I'll say that they do only a few months that I use Perl, but how I'm using it intensively (a lot of hours every day) I suppose that soon I'll have enough level to participate more intensively ;)
I'm a degree on Librarianship and Information Sciences and I would like to talk about the possible use of Perl on the 'cultural' sciences, that is, linguistic, literature, history, sociology, etc (I'm sorry, I'm Spanish and I don't know how to denominate those sciences on English).
Traditionally, at least on my country, those scientific (i'm one of them really, but a bit different) have renegated from the computers and technologies, declaring that the machines 'dehumanize' the men etc. Well I think that we MUST make them understand that the computer is only a TOOL and can be very useful on their work.
And in the case of PERL a very powerful tool, thinking specially on regexp stuff and array and hash possible uses...
The title of this meditation says 'bibliometric', bibliometric is the sciencie that analizes the science advances, growing, relations etc basing on its results. On simple words: applying statistic methods to the scientifics works and using a serie of bibliometrical scientific laws analize the results... and what is better than perl to analize INFORMATION, that is THOUSANDS of records on plain text mainly, preformated by a database export... ummm suposse I have a database of 50000 records of scientific publications and It includes data about the cites to other authors (or the self author) in the scientific work... It would be possible using Perl recreate a 'net of cites' that said 'the author Smith is cited by Jones and cites to..." and if you use a graphical library as GD to VIEW the relations... you know what i mean ;)
Here I post a practical example made by myself to make use of the 'Law of Zipf', that analizes the frequency of each word on a text an concludes a serie of things very interesting (by example on automatic extraction of significant words of a text, automatic abstracts, determination of the 'empty words' on a language etc). It is'n very optimized (sure that it can be optimized severely) but it works, and if you pass as argument a plain text file (I've tried with texts from Project Gutemberg available on http://www.promo.net/pg/) it generates a CSV file with the same name that can be imported directly on Excel or similar and contains 3 columns: word;frequency of the word on the text;relative frequency of the word:
$file=$ARGV[0];
open LIBRO, "<$file" || die $!;
$file =~ /(.*)\.(.*)/;
$ar = $1;
@contenido = <LIBRO>;
foreach (@contenido)
{
chop;
}
$contenido = "@contenido";
$contenido =~ tr/[\.;\,:\"\'\(\)\?\!\-_\*0123456789]/ /;
$contenido =~ tr/[a-z]/[A-Z]/;
@palabras = split /\s/, $contenido;
foreach $palabra (@palabras)
{
if ($palabra ne "")
{
$PF{$palabra}++;
}
}
@palabrasOK = keys %PF;
$npalabras = @palabrasOK;
while (($k, $v)= each %PF)
{
$freq = $v / $npalabras;
$freq =~ tr/\./\,/;
$transfor .= "$k;$v;$freq\n";
}
close LIBRO;
open LIBROOUT, ">$ar.csv";
print LIBROOUT $transfor;
Well, after all this boring stuff ;) I finish, comments and suggestions are welcome, of course
Byes
Ignatius Monk, The Ciberlibrarian Monk on the Perl Order ;)