Re: Wordlist maker
by chromatic (Archbishop) on Sep 17, 2000 at 07:44 UTC
|
Instead of the join, try slurp mode. See local and $/ (the latter might be in perlvar.
Instead of using s///, try tr///. It's more efficient.
Always check the return values of system calls, like open. An array in scalar context gives the number of elements.
my $file;
my $out = 'wordlist.txt';
{
local $/;
$file = <>;
}
$file =~ tr/\n / /s;
$file =~ tr/A-Za-z0-9 //dc;
my %wordlist;
$wordlist{$_}++ foreach (split ' ', $file);
open(LIST, ">$out") or die "Can't open $out: $!";
print LIST join("\n", keys %wordlist);
close LIST;
print (scalar keys %wordlist), " words found. Saved in $out\n";
That's untested, but that's how I'd do it. (Minus any bugs, of course.)
Update: Removed the problematic /d switch from the first tr/// statement, prompted by turnstep's defense of his more comprehensive post. | [reply] [d/l] |
Re: Wordlist maker
by merlyn (Sage) on Sep 17, 2000 at 12:41 UTC
|
print LIST "$_\n" for keys %wordlist;
or perhaps
print LIST "$_\n" while $_ = each %wordlist;
or going the other direction in efficiency (worse {grin}):
print LIST map "$_\n", keys %wordlist;
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] [select] |
|
|
There's something to be said for having only one print, efficiency wise,
though I haven't benchmarked, so:
print LIST join("\n", keys %wordlist, '');
| [reply] [d/l] |
|
|
Actually, that's an interesting question. With the one large string you have the overhead of allocating memory to append the string. I don't know any details in the internals of the memory management involved in that, but we know there is some overhead.
On the other hand, multiple prints with carriage returns will cause the stdio routines to flush to the file or console, so you're invoking the overhead of the system I/O routines for each line, as opposed until waiting for the one big line. And if it's not flushing ($| = 1), then you still have the overhead for the buffer management within stdio.
Anyone know any more details on that? Is it more efficient to let Perl do it's memory management on a big string, or let stdio do it's thing?
--Chris
e-mail jcwren
| [reply] |
|
|
Well, in that case, go with my slow one:
print LIST map "$_\n", keys %wordlist;
At least, I think that'll be slightly faster than having one big fat
string.
Update: duh. apparently not. So much for my gut level feel. Don't trust me anymore,
I guess. {grin}
-- Randal L. Schwartz, Perl hacker | [reply] [d/l] |
|
|
Re: Wordlist maker
by turnstep (Parson) on Sep 17, 2000 at 17:30 UTC
|
A quick and simple way, especially if you don't want to read
the whole file into memory first, would be:
s/([A-Z0-9]{5,})/$seenit{$1}++ or print "$1\n"/egi while <>;
Better yet, save the printing until the end, so you can
sort the words alphabetically, or perhaps by the number of
appearances:
s/([A-Z0-9]{5,})/$seenit{$1}++/egi while <>;
## Sorted by name
for (sort keys %seenit) {
print "$_: $seenit{$_}\n";
}
## Sorted by freuency, then by name:
for (sort {$seenit{$a} <=> $seenit{$b} or $a cmp $b} keys %seenit) {
print "$_: $seenit{$_}\n";
}
As a final suggestion, you may want to disregard the
case of the words, in which case you'd want to use
$seenit{lc $1}. Probably best, as words at the
start of a sentence tend to be capitalized.
| [reply] [d/l] [select] |
Re: Wordlist maker
by Anonymous Monk on Sep 17, 2000 at 20:02 UTC
|
Thanks for all your replies, it looks like that people here at perlmonks.org really like to help beginners like me :)
Well i've benchmarked all the suggestions, and the faster is chromatic's suggestion.
four: 24 wallclock secs (20.96 usr + 2.30 sys = 23.26 CPU)
one: 30 wallclock secs (28.59 usr + 1.79 sys = 30.38 CPU)
three: 22 wallclock secs (19.01 usr + 2.29 sys = 21.30 CPU)
two: 15 wallclock secs (13.54 usr + 1.72 sys = 15.26 CPU)
one: my original code
two: chromatic's code
three: turnstep's code
four: turnstep's code, using merlyn's way to print | [reply] [d/l] |
|
|
| [reply] |
|
|
I've already made the correction and added the length check before benchmarking.
BTW how come i can't register here at perlmonks.org?
I've tried to register 2 times and i didn't received the email with my password in both tries...
| [reply] |
|
|
RE: Wordlist maker
by Zarathustra (Beadle) on Sep 18, 2000 at 03:13 UTC
|
open(LIST, ">wordlist.txt");
while (<>) {
length($_) >= 5 or next;
s/(\W|[1-9])//g;
$i++;
print LIST "$_\n";
}
close(LIST);
print "$i words found. Saved in wordlist.txt\n";
| [reply] [d/l] |
Re: Wordlist maker
by shlomoy (Novice) on Sep 18, 2000 at 13:45 UTC
|
$file=~s/\W//sg; ## remove all alphanumeric characters from all the file.
@words=split( /\s+/, $file); ## put all words in @words.
my @good_words=();
foreach (@words) {
push @good_words, $_ if length $_ < 5; ## lose words shorter than 5 characters
}
## do with @good_words whatever you want | [reply] |
Re: Wordlist maker
by Anonymous Monk on Apr 03, 2020 at 11:38 UTC
|
#!/usr/bin/perl
$startingNum = 0001000000000000;
$EndNum = 9999000000000000;
$KiloBytes = $EndNum - $startingNum /1024;
$MegaByte = $KiloBytes / 1024;
$GigaByte = $MegaByte / 1024;
$Terabyte = $GigaByte / 1024;
print "The File will take up: " , $KiloBytes , "kb\n" , $MegaByte , "m
+b\n" , $GigaByte , "gb\n" , $Terabyte , "TB\n";
while($startingNum++ < $EndNum) {
#print "$startingNum\n";
#print "Writing " + $startingNum + " to file";
printf "%016d\n", $startingNum;
}
2020-04-03 Athanasius added code tags.
| [reply] [d/l] |