Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: improving the efficiency of a script

by TedPride (Priest)
on Jun 19, 2006 at 04:42 UTC ( [id://556181]=note: print w/replies, xml ) Need Help??


in reply to improving the efficiency of a script

This really takes two passes unless you want to load the whole thing into memory. The first pass counts how many words there are corresponding to each letter, and the second pass retrieves the chosen words from the file.
use strict; use warnings; my ($words, $file, $handle, %range, %c, %n, $c, @words); $words = 100; $file = 'dictionary.txt'; $range{$_} = () for 'a'..'z'; open ($handle, $file); while (<$handle>) { $c = substr($_, 0, 1); next if !exists $range{$c}; $c{$c}++; } close ($handle); for (values %c) { choose($_); } open ($handle, $file); while (<$handle>) { $c = substr($_, 0, 1); next if !exists $c{$c} || $n{$c}++ < $c{$c}[-1]; chomp; push @words, $_; pop @{$c{$c}}; delete $c{$c} if $#{$c{$c}} == -1; } close ($handle); print join "\n", sort @words; ### Pick random numbers in range sub choose { my @c = 0..($_[0]-1); for (0..($words-1)) { swap(\@c, $_, rand ($_[0] - $_) + $_); } $_[0] = [sort {$b <=> $a} @c[0..($words-1)]]; } ### Swap two array items sub swap { my ($r, $x, $y, $t) = @_; $t = $r->[$x]; $r->[$x] = $r->[$y]; $r->[$y] = $t; }

Replies are listed 'Best First'.
Re^2: improving the efficiency of a script
by GrandFather (Saint) on Jun 19, 2006 at 05:02 UTC

    For a modest size file (1 M words x about 6 characters = 6 MB) loading it all into memory is likely to be much faster than going out to the file system twice. On the other hand the file system may cache it for you - ya just gotta suck it and see. Even then it may be different tomorrow than it is today.


    DWIM is Perl's answer to Gödel
Re^2: improving the efficiency of a script
by johngg (Canon) on Jun 19, 2006 at 22:44 UTC
    It doesn't necessarily require two passes.

    It seems reasonable to assume that a dictionary file is sorted case-insensitively (and simple to make it so if it isn't already). Then you can read the file building up a list of words beginning with letter 'a'. As soon as you come across the first word beginning with 'b', make your random selection of 100 words beginning with 'a'and then discard the word list and re-initialise it with the 'b' word just read. Repeat until you've gone through the alphabet.

    That way the job is done with a single pass but you don't have to keep the entire dictionary in memory, just one letter's worth.

    use strict; use warnings; my $howMany = shift || 100; my $done = 0; my $rxValidWord = qr{^([A-Za-z])[-a-z]+}; my $totalWordCt = 0; my $validWordCt = 0; my %letterCts = (); my @letters = (q{a} .. q{z}); my $currentLetter = q{}; my $rxCurrent = getNextLetter(); my @wordCache = (); my $dictFile = q{../Web2}; open my $dictFH, q{<}, $dictFile or die qq{open: $dictFile: $!\n}; while(<$dictFH>) { $totalWordCt ++; next if $done; next unless /$rxValidWord/; my $initLetter = $1; $validWordCt ++; chomp; if($initLetter =~ $rxCurrent) { push @wordCache, $_; } else { reportForLetter(); } } reportForLetter() if $currentLetter; close $dictFH or die qq{close: $dictFile: $!\n}; print qq{\n}, qq{Total words in dictionary - $totalWordCt\n}, qq{Valid words processed - $validWordCt\n}; sub generateSlice { my ($total, $howMany) = @_; my $rlPool = [0 .. ($total - 1)]; return $rlPool if $total <= $howMany; my $rlSlice = []; for (1 .. $howMany) { push @$rlSlice, splice @$rlPool, int rand scalar @$rlPool, 1; } return [sort {$a <=> $b} @$rlSlice]; } sub getNextLetter { if(@letters) { $currentLetter = shift @letters; my $charClass = q{[} . $currentLetter . uc $currentLetter . q{]}; return qr{$charClass}; } else { $currentLetter = q{}; $done ++; return 0; } } sub reportForLetter { my $savedWord = $_; print qq{\n}, qq{Found @{[scalar @wordCache]} words }, qq{for letter $currentLetter\n}; my $rlSlice = generateSlice(scalar @wordCache, $howMany); my @randomWords = @wordCache[@$rlSlice]; print qq{$howMany words selected at random\n}; print qq{ $_\n} for @randomWords; @wordCache = ($savedWord); $rxCurrent = getNextLetter(); }

    When run to look for 5 random word it produces this

    Cheers,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://556181]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2024-04-19 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found