Re: promoting array to a hash
by Zaxo (Archbishop) on Jun 13, 2004 at 05:05 UTC
|
I don't see anything wrong with what you have, but you may be looking for a hash slice. Leaving out the grep selection stuff,
my %words;
while (<>) {
@words{ split } = ();
}
I'm not sure what you mean by sorted here, hashes don't support any stable order.
| [reply] [d/l] |
|
|
Yeah, there's nothing wrong with my code above, its just that I was wondering how to get rid of the unnecessary temporary variable %words. For example the following snippet...
@a = keys (a=>1,b=>2,c=>3);
...produces the following error...
Type of arg 1 to keys must be hash (not list), blah, blah, blah
...but I'm willing to bet that there is some syntax to fix the problem.
#This doesn't work
@a = keys %{(a=>1,b=>2,c=>3)};
| [reply] [d/l] [select] |
|
|
Oh, Ok, you almost have it,
@a = sort keys %{{a=>1,b=>2,c=>3}};
or in terms of your original problem,
@a = sort keys %{{ map {$_ => undef} map {split} <> }};
Notice the replacement of parens with curlies. That makes the hashlike list into a hash reference to its contents, and the outer %{} dereferences it.
I agree with your desire to avoid temporary variables, I try to do that, too, in perl.
| [reply] [d/l] [select] |
|
|
Re: promoting array to a hash
by dragonchild (Archbishop) on Jun 13, 2004 at 05:08 UTC
|
sub unique { my %x;@x{@_}=@_;values %x}
my @sorted_unique = sort unique (split ' ', do { local $\=undef;<> });
In other words, use the hash.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] [d/l] |
|
|
sub uniq2{ my %x; @x{ @_ } = (); keys %x }
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] [select] |
|
|
very important update: Please see Re^4: promoting array to a hash by BrowserUk for why the following code is horrifically wrong. Of course, that said, it requires one very simple s!my!our! to correct.
I know that the map vs. slice benchmark has been done before, but just to do it again as a reminder :)
#!perl -w
use strict;
use Benchmark ':all';
my @unsorted = map {
join('', map { ('a'..'z','A'..'Z',0..9)[rand 62] } 1..50)
} 1..5000;
sub uniq_dragonchild { my %x; @x{@_} = @_; values %x }
sub uniq_BrowserUk { my %x; @x{@_} = (); keys %x }
sub uniq_Zaxo { keys %{ { map { $_ => undef } @_ } } }
cmpthese(
timethese(-60, {
uniq_dragonchild => 'my @x = uniq_dragonchild(@unsorted)',
uniq_BrowserUk => 'my @x = uniq_BrowserUk(@unsorted)',
uniq_Zaxo => 'my @x = uniq_Zaxo(@unsorted)'
} )
);
__END__
C:\>uniq.pl
Benchmark: running uniq_BrowserUk, uniq_Zaxo, uniq_dragonchild for at
+least 60 C
PU seconds...
uniq_BrowserUk: 64 wallclock secs (63.19 usr + 0.02 sys = 63.20 CPU)
+@ 421025.4
1/s (n=26610069)
uniq_Zaxo: 59 wallclock secs (60.08 usr + 0.03 sys = 60.11 CPU) @ 18
+6939.31/s
(n=11237109)
uniq_dragonchild: 64 wallclock secs (63.05 usr + 0.02 sys = 63.06 CPU
+) @ 399674
.64/s (n=25204682)
Rate uniq_Zaxo uniq_dragonchild uniq_Bro
+wserUk
uniq_Zaxo 186939/s -- -53%
+ -56%
uniq_dragonchild 399675/s 114% --
+ -5%
uniq_BrowserUk 421025/s 125% 5%
+ --
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
|
It will be slightly quicker. However, it is of less use. My version will DWIM references while yours won't.
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
I shouldn't have to say this, but any code, unless otherwise stated, is untested
| [reply] |
|
|
|
|
|
Re: promoting array to a hash
by ambrus (Abbot) on Jun 13, 2004 at 10:22 UTC
|
Now Zaxo has solved your original problem, but let me have a different
question about your code.
Do you realize that grep /^[a-z]+$/, (split /\s/, join(" ",<>)) will return only those words that appear without
punctation in the text? For example, if you input
"hello, world" to that program, it will output only
world, as split splits it to "hello," and "world"
but /^[a-z]+$/ does not match the first one.
If that's what you want, ok. If you want to match
those words with punctation too, you should do something like
grep /^[a-z]+$/, (join(" ",<>)=~/(\w+)/g)
instead of the above grep.
This makes the code look like this:
print "$_\n" for sort keys %{{
map {$_, 1} grep /^[a-z]+$/, (join(" ",<>)=~/(\w+)/g)
}};
or, more simply,
print "$_\n" for sort keys %{{
map {$_, 1} join(" ",<>)=~/\b[a-z]+\b/g
}};
Also, instead of eliminating the temp hash, one could
use a temp hash but eliminate map, which is IMO more
elegant. (Update: I now see this has been borught up before.)
my %hash;
$hash{$_}++ for
join(" ",<>)=~/\b[a-z]+\b/g;
print "$_\n" for sort keys %hash;
| [reply] [d/l] [select] |
Re: promoting array to a hash
by hsinclai (Deacon) on Jun 13, 2004 at 05:07 UTC
|
With even numbered elements, I thought you could just assign it:
use strict;
my @friends = ("noc", "john", "brightland", "christine", "marsh", "bra
+ndon");
# create hash from array
my %friends = @friends;
foreach my $entry (keys %friends) {
print "Company $entry has buddy $friends{$entry}\n";
}
__OUTPUT__
Company brightland has buddy christine
Company marsh has buddy brandon
Company noc has buddy john
IIRC, for an uneven number of array element, the last pair in the hash is assigned with an empty value
| [reply] [d/l] [select] |
|
|
What does your answer have to do with the question? He's asking about using the keys of a hash to generate a list of words with duplicates filtered out. Hashes are good for this. You're talking about assigning array elements to hash key/value pairs. Hashes are good for that too, but those are two different, mostly unrelated subjects.
| [reply] |
|
|
I totally missed the point, sorry for posting that.
| [reply] [d/l] [select] |
Re: promoting array to a hash
by Jasper (Chaplain) on Jun 14, 2004 at 12:38 UTC
|
If all you are doing is printing a list of unique words from stdin, why not save a lot of wasted code and do:
print "$_\n" for sort <> =~ /\b(\S+)\b(?!.*\b\1\b)/g
That is, use a negative lookahead to check the word doesn't appear again. Saves you joining, splitting, grepping, and mapping :). I have not benchmarked it, though. | [reply] [d/l] |
|
|
Benchmarking is worthwhile in this instance. The regex backtracking turns an N*log(n) problem (assuming the sort dominates) into an N^2 problem. Here's the result of applying the two algorithms to the Net-Howto (which is 100 times smaller than the data set I initially used).
greg@spark:~/test$ cat sleepingsquirrel
#!/usr/bin/perl
print "$_\n" for sort keys %{{map {$_,()} grep /^[a-z]+$/, (split /\s/
+, join(" ",<>))}};
greg@spark:~/test$ time sleepingsquirrel Net-HOWTO >words.txt
real 0m0.178s
user 0m0.158s
sys 0m0.016s
greg@spark:~/test$ cat jasper
#!/usr/bin/perl
$/=undef;
print "$_\n" for sort <> =~ /\b([a-z]+)\b(?!.*\b\1\b)/sg
greg@spark:~/test$ time jasper Net-HOWTO >words2.txt
real 1m8.477s
user 1m8.471s
sys 0m0.003s
...only about 350x slower. YMMV | [reply] [d/l] |