Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^2: Hash table manipulation

by sarvan (Sexton)
on Jul 12, 2011 at 07:32 UTC ( #913857=note: print w/replies, xml ) Need Help??

in reply to Re: Hash table manipulation
in thread Hash table manipulation

Hello everyone, I will clearly tell you the problem.

In my program what i will do is, i will take google's top 10 results for a query and i will take the snippet's or title's from each result and compute the similarity between this returned snippet and the original query..

The computed similarity value is what i stored in the hash as key and the url's will be straightaway stored as hash value..

for e.g $url={will contain the 10 url's grepped from xml result file} $value=sim();#will contain the similarity computed for all the ten res +ults; %hash; sim{ #here i compute similarity between query title and resulted snippet.. return $val;#i will return similarity value for each result } $hash{$value}=$url;#storing key & value in hash #i will sort the keys in hash in descending order to get the highest v +alue in top and print the url associated with that as a ouput..
Now i want to filter it by some threshold like keys higher then 0.4 or something..

Replies are listed 'Best First'.
Re^3: Hash table manipulation
by GrandFather (Saint) on Jul 12, 2011 at 11:31 UTC

    Ok, given what you are doing your choice of using a hash is fair enough. The precision of the floating point values is fairly unimportant so any rounding or truncation that happens when using the numbers as keys is very unlikely to matter. To select the top N URLs I'd do something like:

    use strict; use warnings; my %urls = ( 0.999 => '', 0.65 => '', 0.451 => '', 0.222 => '', 0.12 => '', ); my @inOrder = sort {$b <=> $a} keys %urls; my @topThree = splice @inOrder, 0, 3; print "$_: $urls{$_}\n" for @topThree;


    0.999: 0.65: 0.451:
    True laziness is hard work
      Hi GrandFather,

      Thanks for the post. But this sorting and getting the highest value, i have already done.

      I have asked for how could i get the values that are higher than certain threshold.. Plz let me know if my explanation is not adequate..

        Change the selection line to:

        my @topThree = grep ($_ > 0.4) @inOrder;

        replacing 0.4 with whatever your selection threshold value is.

        True laziness is hard work

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://913857]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2023-09-25 09:51 GMT
Find Nodes?
    Voting Booth?

    No recent polls found