Some comments:
- use strict warnings and diagnostics or die
- The file regex may produce some unwanted results
with unix-like filenames: file.txt.bak . Just think which
part you want to retain. You can find the first and last dot
with index and rindex, respectively.
-
You slurp the contents in array context only to join the
array. You can also set $/ to undef:
{
local $/ = undef;
$contenido = <LIBRO>;
}
You can leave the newlines intact, they will be catched
with '\s'. Even better, tr will take care of that.
-
Use lc or uc to change the case.
-
You can simplify the translation, by complementing the
list to the alphabetic range (see perlop):
$contenido = uc $contenido;
$contenido =~ tr/A-Z/ /cs;
-
Use '\s+' rather than '\s', so you don't have to test for
empty cases.
-
You can get the total number without array assignment:
$npalabras = keys %PF;The scalar context will
force immediate size return.
-
I would print LIBROUT in the while loop, so the
system will get the chance to buffer nicely.
It's quite a list, but I hope it will give you the chance
to learn new idiom. Result:
#....
my $contenido;
{
local $/ = undef;
$contenido = <LIBRO>;
}
$contenido = uc $contenido;
$contenido =~ tr/A-Z/ /cs;
my %PF;
$PF{$_}++ for( split /\s+/, $contenido);
open LIBROUT, ">$ar.csv";
my $npalabras = keys %PF;
while( keys %PF ){
print LIBROUT join ';', $_, my $f=$PF{$_}, $f/ $npalabras;
print LIBROUT "\n";
}
Well, you see how the use of $_ simplifies things..
Hope this helps,
Jeroen
"We are not alone"(FZ)
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.