Speed up my code

arivu198314 has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm looking someone to optimize my code

input file

picturesque liar
fight nor fly
24 hours
love life
wrinkled
a million dollars
love life
even plan
you attempt things
a million dollars
many hardships
many hardships
married
head
many hardships
this year
secret shame
present
apple pie
many hardships
elephant
careful
peace
many hardships
apple pie
good afternoon
seven levels
[download]

Output is, we should count the items and which items coming more times in the input. I need to print most coming items of 5. Means, output for this input is

many hardships
love life
apple pie
a million dollars
this year
[download]

I have written a script, but its taking some time to finish this task.

#!/usr/bin/perl
open(INP, '<', $ARGV[0]);
while ($item=<INP>)
{
    chomp($item);
    (exists $query{$item}) ? ($query{$item}=$query{$item}+1) : ($query
+{$item}=1);
}
print join("\n", (sort {$query{$b}<=>$query{$a}} (keys %query))[0..4])
+."\n";
[download]

Basically i want to know, how to optimize this code as ninja speed

Comment on Speed up my code Select or Download Code

Replies are listed 'Best First'.
Re: Speed up my code by toolic (Bishop) on Dec 14, 2011 at 13:22 UTC
I doubt it will have much impact on speed, but `(exists $query{$item}) ? ($query{$item}=$query{$item}+1) : ($query +{$item}=1);` [download] is more simply written as `$query{$item}++;` [download] Did you profile your code (see perldoc perlrun)?	[reply] [d/l] [select]
Re: Speed up my code by BrowserUk (Patriarch) on Dec 14, 2011 at 13:46 UTC
but its taking some time to finish this task. It won't make any discernible difference on such a small dataset as your sample, but avoiding the sort for much larger datasets should be a win: `#! perl -slw use strict; my %hash; ++$hash{ <> } until eof(); my @top5; for( keys %hash ) { for my $i ( 0 .. 4 ) { if( !defined( $top5[ $i ] ) or $hash{ $_ } > $hash{ $top5[ $i +] } ) { splice @top5, $i, 0, $_; pop @top5 if @top5 > 5;; last; } } } print @top5; __END__ many hardships a million dollars love life apple pie you attempt things` [download] Note: the difference between my output and your expected is due to there being no clear winner for 5th place. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^2: Speed up my code by Tux (Canon) on Dec 14, 2011 at 14:50 UTC
Much shorter :) `$ perl -ne'$x{$_}++}END{print"$x{$_}\t$_"for(sort{$x{$b}<=>$x{$a}}keys +%x)[0..4]' test.txt 5 many hardships 2 a million dollars 2 love life 2 apple pie 1 you attempt things` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l]
Re^2: Speed up my code by BrowserUk (Patriarch) on Dec 14, 2011 at 14:54 UTC
Update: Ignore this. The optimisation broke it. <Reveal this spoiler or all in this thread> With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re: Speed up my code by jethro (Monsignor) on Dec 14, 2011 at 13:35 UTC
I assume your input file is much biggger in reality otherwise I can't imagine this script taking a lot of time. In that case sorting the whole %query hash to only get the 5 highest might be slow Instead you could use this: `my @result=([0,''],[0,''],[0,''],[0,''],[0,'']); foreach (keys %query) { my $sw; my $n= [$query{$_},$_]; if ($n->[0]>$result[4]) { if ($n->[0]>$result[3]) { if ($n->[0]>$result[2]) { if ($n->[0]>$result[1]) { if ($n->[0]>$result[0]) { $sw=$result[0]; $result[0]=$n; $n=$sw; } $sw=$result[1]; $result[1]=$n; $n=$sw; } $sw=$result[2]; $result[2]=$n; $n=$sw; } $sw=$result[3]; $result[3]=$n; $n=$sw; } $sw=$result[4]; $result[4]=$n; $n=$sw; } } print join("\n",map{$_->[1]},@result);` [download] The speedup of this somewhat ugly contraption lies in the expectation that the first 'if' will be true very very seldom for a big hash	[reply] [d/l]
Re: Speed up my code by RichardK (Parson) on Dec 14, 2011 at 13:47 UTC
In your data there a lots of lines that only appear once, so why is 'this year' the correct choice to fill out your top five?	[reply]
Re^2: Speed up my code by arivu198314 (Sexton) on Dec 14, 2011 at 14:04 UTC
That's not important, means the last one. Since all the other items occuring only once	[reply]
Re: Speed up my code by locked_user sundialsvc4 (Abbot) on Dec 14, 2011 at 17:57 UTC
It seems to me that a simple hash would do nicely. Perl has many shortcuts to simplify this sort of thing, for example: `my $foo = "bar"; my $counts = {}; $$counts{$foo}++; // or $counts->{$foo}++; // yields: "$$counts{"bar"} == 1` [download] You don’t have to worry if the string is already in the hash; you don’t have to initialize the bucket to zero. If the bucket isn’t in there, it is automagically initialized with the value of '1.' If it is, it’s incremented. It Just Works.™ There are many ways to get the final tallies out of the structure, depending on your needs. This straight-ahead strategy works excellently for any data volume that can be reasonably expected to fit entirely in memory without incurring page-faults. (Which, given the beefy size of computers these days, is a pretty safe bet.) The original choice of Perl was that the word was an acronym for P(ractical \| ragmatic) Extraction and Reporting Language, otherwise known as The Swiss Army® Knife. And this is one of the reasons why. Text-handling tasks, that lots of folks have to do lots of ... those bread-and-butter tasks ... are easy to code and ruggedly implemented.
Re^2: Speed up my code by AnomalousMonk (Archbishop) on Dec 15, 2011 at 02:59 UTC
Way I heard it, the original choice of Perl by Larry was because he wanted to name his creation Pearl, but there was already a 'pearl' application on the system on which he was developing. The individual letters meant nothing. Perl was then backronymed as, among others, P(ractical \| ragmatic) Extraction and Reporting Language.	[reply]