Re: How To Do This Better?
by ahunter (Monk) on Apr 14, 2000 at 23:07 UTC
|
Well, one of the great things about Perl is that there's
always a way of cheating. In this case, the cheat is to
note that s/// returns the number of replacements (and
gets rid of the letters we've already looked at).
So:
{
local $/ = undef; # Read everything in one go
$_ = lc(<STDIN>);
s/[^a-z]//g; # Get rid of nonalphabetic character
+s
for my $letter ('a'..'z')
{
my $count = s/$letter//g; # Magic!
print "$letter = $count\n";
}
}
Now, that's many times faster than your original, and a
fair bit shorter. I suspect it can be made faster still,
though (is the slowdown produced by using s/// offset by
the speedup on the next iteration, for instance?)
perl /home/ahunter/original.pl 154.02s user 2.39s system 88% cpu 2:56.29 total
perl /home/ahunter/flib.pl < xlib.ps 4.48s user 0.14s system 93% cpu 4.942 total
Well, that *was* impressive...
-- Andrew | [reply] [d/l] |
RE: How To Do This Better?
by japhy (Canon) on Apr 15, 2000 at 14:51 UTC
|
while (<FILE>) { $count{lc $1}++ while /([a-zA-Z])/g }
Is that too short or succinct for anyone? | [reply] [d/l] |
|
|
I will roast myself alive for showing you this code:
{ $count{lc $1}++ while (defined $_ and /([a-zA-Z])/g) or (defined($_
+= <FILE>) and redo) }
That's the kind of thing I strive for :) | [reply] [d/l] |
Re: How To Do This Better?
by mikfire (Deacon) on Apr 15, 2000 at 00:58 UTC
|
Boy, you guys work hard.
if you are sucking from file, try this one:
$/ = undef;
$line = <FILE>;
close FILE;
$line =~ s/\W+//g;
print "I found ", length( $line ), "characters in the file\n";
which will count letters ( okay, it will include underscores
and numbers too ). Remove the substitution for a character
count. For exactly letters, use:
$line =~ s/[^a-zA-Z]+//g;
Just another way to do it,
Mik
Mik Firestone ( perlus bigotus maximus ) | [reply] [d/l] [select] |
|
|
$string = "abc1235ABC";
$number = ($string =~ tr[a-zA-Z][a-zA-Z]);
print "I counted $number alphabetical characters.\n";
print "My string is still ->$string<-\n";
(can't believe I forgot about that one) | [reply] [d/l] |
Re: How To Do This Better?
by btrott (Parson) on Apr 14, 2000 at 22:43 UTC
|
I came up with this. It's less code than yours (well, if
you spread your code out a bit :), but it's
actually a bit slower, in benchmarking. So there may be
better ways of going about it.
#!/usr/local/bin/perl -w
use strict;
my %count;
s/([a-zA-Z])/{ $count{lc $1}++; $1 }/eg
while <>;
for my $letter (sort keys %count) {
print $letter, "=", $count{$letter}, "\n";
}
It works by finding a letter ("a-zA-Z"), lower-casing
it, and increasing the count of that letter; and it
does so for each letter that it finds. | [reply] [d/l] |
Re: How To Do This Better?
by chromatic (Archbishop) on Apr 14, 2000 at 23:09 UTC
|
One optimization right off the bat, if the string contains much more than 52 characters:
s/([a-zA-Z])/{ $count{$1}++; $1 }/eg while <>;
foreach (A .. Z) {
$count{lc($_)} += $count{$_} || 0;
}
Disclaimer: untested, but theoretically valid. (corrected on 15 April as turnstep noticed a typo) | [reply] [d/l] |
Re: How To Do This Better?
by NoTwoGroo (Initiate) on Apr 14, 2000 at 23:29 UTC
|
If you're pulling your input from a file, then
while(my $c = getc(FILE)) {
$count{lc($c)}++ if $c=~/[a-zA-Z]/;
}
is going to be quite a bit faster than anything involving
split() or a regexp.
getc() has some issues, but this seems like an ideal
use for it.
| [reply] [d/l] |
|
|
You probably meant to say:
while(!eof(FILE))
{
my $c = getc(FILE);
$count{lc($c)}++ if $c=~/[a-zA-Z]/;
}
As your original seems to give up at the first newline.
Plus, you have to remember that perl compiles regular
expressions to make them run faster, particularily when
they don't require backtracking (basically creates a
finite state machine to do the job). As these are executed
in C, writing perl to do the same job is *always* going
to be slower.
Plus getc() isn't exactly a star performer, either. Maybe
use the unbuffered IO stuff if you want to improve
performance in this area, though you'd really have to
be after squeezing the last ounce of speed out of the
thing in that case (and you'd have to remember not ever to
use any of the buffered routines)
The important thing is to try it, of course, especially
where perl performance is concerned. Remember that perl
is interpreted (it compiles to a byte-code at runtime), but
the internal functions are compiled, and are always faster.
perl /home/ahunter/grob.pl < xlib.ps 39.04s user 0.24s system 95% cpu 41.126 total
Well, a 3x speed-up over the original isn't really all
that bad, I suppose.
-- Andrew
| [reply] [d/l] |
|
|
Well, unless getc() works differently on non-unix systems
(and it wouldn't surprise me), it's documented to read
until EOF, so the while(not(eof(X))) doesn't seem
necessary. getc() doesn't care about newlines.
Even given the possible less-than-best read
performance of getc() relative to <FILE>,
I'd still expect the getc loop to be faster, since even
though regexps are fast, not having to use them at all
is faster still. Doing a split or a s///
on each line of input is likely to kill your speed gain.
Of course, there are trade offs... if all your files are small,
you probably don't care if you have the fastest three lines.
For anything other than looking at each character in the file,
using getc() probably will suck. But if it's applicable,
benchmark whatever alternatives you're looking at...
On my box, a getc() loop was significanly faster than
<FILE>...
| [reply] |
RE: How To Do This Better?
by Anonymous Monk on Apr 15, 2000 at 20:23 UTC
|
The code by ahunter is still the fastest:
chromatic "52" : 27 secs (25.76 usr 0.04 sys = 25.80 cpu)
NoGrooTwo+ahunter : 17 secs (16.72 usr 0.04 sys = 16.76 cpu)
japhy 1 : 16 secs (13.84 usr 0.04 sys = 13.88 cpu)
japhy 2 : 12 secs (11.71 usr 0.07 sys = 11.78 cpu)
ahunter : 5 secs ( 4.65 usr 0.00 sys = 4.65 cpu)
ahunter+turnstep : 3 secs ( 3.62 usr 0.03 sys = 3.65 cpu)
| [reply] [d/l] |
Re: How To Do This Better?
by turnstep (Parson) on Apr 15, 2000 at 02:27 UTC
|
For the first example by ahunter, the line
s/[^a-z]//g;
is not needed and just slows things down.
(The 'my's are superfluous as well)
I like it though, and haven't been able to
find anything faster. (also playing by the rules
by making every iteration in my Benchmarking tests
open the file and slurp it in itself...)
| [reply] [d/l] |