sleepingsquirrel has asked for the wisdom of the Perl Monks concerning the following question:

I'm surely overlooking something simple here, but what's the syntax for turning an array into a hash. I'm trying to eliminate the superfluous %words in the code below...
#reduces a file to a sorted list of unique words my %words=map {$_,1} grep /^[a-z]+$/, (split /\s/, join(" ",<>)); print "$_\n" for sort keys %words;

Replies are listed 'Best First'.
Re: promoting array to a hash
by Zaxo (Archbishop) on Jun 13, 2004 at 05:05 UTC

    I don't see anything wrong with what you have, but you may be looking for a hash slice. Leaving out the grep selection stuff,

    my %words; while (<>) { @words{ split } = (); }
    I'm not sure what you mean by sorted here, hashes don't support any stable order.

    After Compline,
    Zaxo

      Yeah, there's nothing wrong with my code above, its just that I was wondering how to get rid of the unnecessary temporary variable %words. For example the following snippet...
      @a = keys (a=>1,b=>2,c=>3);
      ...produces the following error...
      Type of arg 1 to keys must be hash (not list), blah, blah, blah
      ...but I'm willing to bet that there is some syntax to fix the problem.
      #This doesn't work @a = keys %{(a=>1,b=>2,c=>3)};

        Oh, Ok, you almost have it, @a = sort keys %{{a=>1,b=>2,c=>3}}; or in terms of your original problem, @a = sort keys %{{ map {$_ => undef} map {split} <> }}; Notice the replacement of parens with curlies. That makes the hashlike list into a hash reference to its contents, and the outer %{} dereferences it.

        I agree with your desire to avoid temporary variables, I try to do that, too, in perl.

        After Compline,
        Zaxo

Re: promoting array to a hash
by dragonchild (Archbishop) on Jun 13, 2004 at 05:08 UTC
    sub unique { my %x;@x{@_}=@_;values %x} my @sorted_unique = sort unique (split ' ', do { local $\=undef;<> });

    In other words, use the hash.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      A slightly quicker version of dragonchilds unique() sub.

      sub uniq2{ my %x; @x{ @_ } = (); keys %x }

      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail

        very important update: Please see Re^4: promoting array to a hash by BrowserUk for why the following code is horrifically wrong. Of course, that said, it requires one very simple s!my!our! to correct.

        I know that the map vs. slice benchmark has been done before, but just to do it again as a reminder :)

        #!perl -w use strict; use Benchmark ':all'; my @unsorted = map { join('', map { ('a'..'z','A'..'Z',0..9)[rand 62] } 1..50) } 1..5000; sub uniq_dragonchild { my %x; @x{@_} = @_; values %x } sub uniq_BrowserUk { my %x; @x{@_} = (); keys %x } sub uniq_Zaxo { keys %{ { map { $_ => undef } @_ } } } cmpthese( timethese(-60, { uniq_dragonchild => 'my @x = uniq_dragonchild(@unsorted)', uniq_BrowserUk => 'my @x = uniq_BrowserUk(@unsorted)', uniq_Zaxo => 'my @x = uniq_Zaxo(@unsorted)' } ) ); __END__ C:\>uniq.pl Benchmark: running uniq_BrowserUk, uniq_Zaxo, uniq_dragonchild for at +least 60 C PU seconds... uniq_BrowserUk: 64 wallclock secs (63.19 usr + 0.02 sys = 63.20 CPU) +@ 421025.4 1/s (n=26610069) uniq_Zaxo: 59 wallclock secs (60.08 usr + 0.03 sys = 60.11 CPU) @ 18 +6939.31/s (n=11237109) uniq_dragonchild: 64 wallclock secs (63.05 usr + 0.02 sys = 63.06 CPU +) @ 399674 .64/s (n=25204682) Rate uniq_Zaxo uniq_dragonchild uniq_Bro +wserUk uniq_Zaxo 186939/s -- -53% + -56% uniq_dragonchild 399675/s 114% -- + -5% uniq_BrowserUk 421025/s 125% 5% + --
        It will be slightly quicker. However, it is of less use. My version will DWIM references while yours won't.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: promoting array to a hash
by ambrus (Abbot) on Jun 13, 2004 at 10:22 UTC

    Now Zaxo has solved your original problem, but let me have a different question about your code.

    Do you realize that grep /^[a-z]+$/, (split /\s/, join(" ",<>)) will return only those words that appear without punctation in the text? For example, if you input "hello, world" to that program, it will output only world, as split splits it to "hello," and "world" but /^[a-z]+$/ does not match the first one. If that's what you want, ok. If you want to match those words with punctation too, you should do something like

    grep /^[a-z]+$/, (join(" ",<>)=~/(\w+)/g)
    instead of the above grep. This makes the code look like this:
    print "$_\n" for sort keys %{{ map {$_, 1} grep /^[a-z]+$/, (join(" ",<>)=~/(\w+)/g) }};
    or, more simply,
    print "$_\n" for sort keys %{{ map {$_, 1} join(" ",<>)=~/\b[a-z]+\b/g }};

    Also, instead of eliminating the temp hash, one could use a temp hash but eliminate map, which is IMO more elegant. (Update: I now see this has been borught up before.)

    my %hash; $hash{$_}++ for join(" ",<>)=~/\b[a-z]+\b/g; print "$_\n" for sort keys %hash;
Re: promoting array to a hash
by hsinclai (Deacon) on Jun 13, 2004 at 05:07 UTC
    With even numbered elements, I thought you could just assign it:

    use strict; my @friends = ("noc", "john", "brightland", "christine", "marsh", "bra +ndon"); # create hash from array my %friends = @friends; foreach my $entry (keys %friends) { print "Company $entry has buddy $friends{$entry}\n"; } __OUTPUT__ Company brightland has buddy christine Company marsh has buddy brandon Company noc has buddy john


    IIRC, for an uneven number of array element, the last pair in the hash is assigned with an empty value
      What does your answer have to do with the question? He's asking about using the keys of a hash to generate a list of words with duplicates filtered out. Hashes are good for this. You're talking about assigning array elements to hash key/value pairs. Hashes are good for that too, but those are two different, mostly unrelated subjects.

      Dave

        I totally missed the point, sorry for posting that.





Re: promoting array to a hash
by Jasper (Chaplain) on Jun 14, 2004 at 12:38 UTC
    If all you are doing is printing a list of unique words from stdin, why not save a lot of wasted code and do:
    print "$_\n" for sort <> =~ /\b(\S+)\b(?!.*\b\1\b)/g
    That is, use a negative lookahead to check the word doesn't appear again. Saves you joining, splitting, grepping, and mapping :). I have not benchmarked it, though.
      Benchmarking is worthwhile in this instance. The regex backtracking turns an N*log(n) problem (assuming the sort dominates) into an N^2 problem. Here's the result of applying the two algorithms to the Net-Howto (which is 100 times smaller than the data set I initially used).
      greg@spark:~/test$ cat sleepingsquirrel #!/usr/bin/perl print "$_\n" for sort keys %{{map {$_,()} grep /^[a-z]+$/, (split /\s/ +, join(" ",<>))}}; greg@spark:~/test$ time sleepingsquirrel Net-HOWTO >words.txt real 0m0.178s user 0m0.158s sys 0m0.016s greg@spark:~/test$ cat jasper #!/usr/bin/perl $/=undef; print "$_\n" for sort <> =~ /\b([a-z]+)\b(?!.*\b\1\b)/sg greg@spark:~/test$ time jasper Net-HOWTO >words2.txt real 1m8.477s user 1m8.471s sys 0m0.003s
      ...only about 350x slower. YMMV