maheshkumar has asked for the wisdom of the Perl Monks concerning the following question:

Can I find the average number of hops that appear for a particular country example in the following file, the China appears with different hops e.g. 20 hops or 12 hops. So is it possible that i can find the average

So far what I am doing is to print only the lines where China appears along with the hops

use Locale::Country; open my $in, '<', 'num_traces' or die $!; open my $out, '>', 'analysis' or die $!; my @names; my $value = 0; my $value_1 = 0; while ( my $line = <$in>) { @names = all_country_names(); if ($line =~ /China\s\d+\shops/m) { print {$out} "$line\n"; $value++; } }

Any thoughts on this? The Text file looks like the following

Trace for 216.137.61.109 - Seattle -United States 10 hops Trace for 192.168.178.1 - 1 hops Trace for 8.8.8.8 - Mountain View -United States 8 hops Trace for 92.123.72.112 - -Europe Trace for 62.41.85.112 - -Unit +ed Kingdom Trace for 213.155.157.35 - -United Kingdom 9 hops Trace for 216.137.63.227 - Seattle -United States 12 hops Trace for 77.75.76.22 - -Czech Republic 9 hops Trace for 80.237.208.84 - Höst -Germany 8 hops Trace for 208.67.222.222 - San Francisco -United States Trace for 19 +3.93.125.43 - Paris -France Trace for 218.30.82.201 - Beijing -Ch +ina Trace for 91.203.186.13 - -France Trace for 59.53.86.5 - Bei +jing -China 21 hops Trace for 217.72.203.5 - -Germany 9 hops Trace for 123.235.38.123 - Jinan -China Trace for 174.129.215.122 - + Ashburn -United States 30 hops Trace for 77.75.72.22 - Ricany -Czech Republic 30 hops Trace for 123.235.39.126 - Jinan -China 18 hops Trace for 123.235.43.75 - Jinan -China 17 hops Trace for 122.226.213.92 - Yongkang -China 21 hops Trace for 60.28.178.50 - Tianjin -China 19 hops Trace for 60.29.252.89 - Tianjin -China 16 hops Trace for 209.85.227.101 - Mountain View -United States 10 hops Trace for 209.85.148.101 - Mountain View -United States 7 hops Trace for 218.30.66.7 - Beijing -China 19 hops Trace for 204.93.163.233 - Chicago -United States 30 hops Trace for 222.88.95.14 - Beijing -China 30 hops Trace for 124.225.135.224 - -China Trace for 204.9.177.195 - San +Francisco -United States 30 hops Trace for 211.100.56.204 - Beijing -China 21 hops Trace for 193.93.124.172 - Paris -France 30 hops Trace for 122.218.102.11 - Osaka -Japan 18 hops Trace for 202.58.48.1 - -Australia 30 hops Trace for 202.58.49.1 - -Australia 30 hops Trace for 87.248.210.185 - London -United Kingdom Trace for 87.248.2 +01.77 - -Italy Trace for 95.140.237.60 - -United Kingdom 7 hops Trace for 69.175.33.75 - Chicago -United States 18 hops Trace for 122.228.242.240 - Beijing -China 30 hops Trace for 121.207.229.250 - Fuzhou -China Trace for 60.28.213.151 - + Tianjin -China Trace for 61.155.199.240 - Beijing -China 30 hops Trace for 114.80.182.250 - Shanghai -China Trace for 65.203.229.217 +- Seattle -United States 16 hops Trace for 65.242.27.136 - Seattle -United States 16 hops Trace for 67.195.160.134 - Sunnyvale -United States 30 hops Trace for 67.221.32.222 - Rancho Cucamonga -United States 18 hops Trace for 204.11.109.24 - Emeryville -United States Trace for 70.42. +35.80 - Atlanta -United States 30 hops Trace for 198.41.0.4 - Sterling -United States 30 hops Trace for 92.123.64.26 - -Netherlands 10 hops Trace for 195.59.150.146 - -United Kingdom Trace for 92.123.69.75 - + -Europe 6 hops Trace for 92.122.212.42 - -Europe Trace for 92.123.66.243 - -Eur +ope Trace for 77.67.20.43 - -Netherlands 9 hops Trace for 121.14.24.252 - Guangzhou -China 30 hops

Replies are listed 'Best First'.
Re: Finding average from numbers appearing in a file
by davido (Cardinal) on Aug 05, 2012 at 01:48 UTC

    Use your regular expression to capture the hops count for China. Every time China is found, add the value captured to an accumulator for total number of hops. Increment a separate variable by one each time China is found. After the loop terminates, divide the total number of hops by the number of times China was found. That's your average.

    If you need to keep track of more than just China, store each country as a hash key, and let that hash point to anonymous arrays containing a "found" count and a "hops" total for that country. After the primary loop ends, loop over the countries stored in the hash, and for each one, do the division as described above.

    If you're having trouble with capturing the hops, you're almost there, and will be 100% there once you skim through, perlrequick and perlretut, looking for how to use capturing parenthesis.

    I doubt you're having trouble with the math. And it seems that you understand how to store a value in variables. The += operator will come in handy for your hops accumulator.

    If you end up going the hash of arrays route, you might need to look at perlreftut, and maybe perldsc.

    It would have saved a lot of typing for me to just provide the solution, but then (a) you never ascend the learning curve, (b) you never gain the pleasure and confidence that come with figuring it out yourself, and (c) your client would owe me, not you.


    Dave

      I am having really a big trouble understanding and using hash, can you recommend a link or anything which explains with practical examples clearly.... Because arrays can work all the time i have found... Thanks

Re: Finding average from numbers appearing in a file
by 2teez (Vicar) on Aug 05, 2012 at 05:15 UTC
    Hi,

    Using the text file given in the above question or request, if I may give you a head up.

    So is it possible that i can find the average

    Yes, it is Possible!
    Say, you want to find the average for country china only , you might do something like so:

    use warnings; use strict; my $total_number_of_hops = 0; my $number_of_matches = 0; while (<>) { chomp; if (/.+?-(china\s+?[0-9]+?\s+?hops?)/is) { ++$number_of_matches; my ( $country, $number_of_hops, $hop ) = split /\s+/, $1, 3; $total_number_of_hops += $number_of_hops; } } printf "Average Number of Hops for China is %.2f", $total_number_of_hops / $number_of_matches;
    However, it becomes a whole new ball game, if you are considering, doing the same for all the countries in the file. But that also is not difficult, as it may look.
    This is what you can do. Modifying the previous code a bit.
    • change the regex in the if() to get all countries, including those that has space within their names,
    • use data structure that get all the hops for each of the countries individually, then
    • use either a for or a while loop or 'whichever', to get the countries and the average
    • It's that simple ;-)
    Here is the head up
    use warnings; use strict; use Data::Dumper; my $country_hop_ref = {}; while (<>) { chomp; if (/.+?-(([a-z\s+]+?)\s+?([0-9]+?)\s+?hops?)/is) { push @{ $country_hop_ref->{$2} }, $3; } } print Dumper $country_hop_ref;

    That is the head up, you need to find the average, yourself!!!

    Please check the following:
    perldoc perldsc,
    perldoc perllol,
    perldoc perlref

    UPDATE: run all the scripts in this post, using the text file given by the OP as arugment on the CLI.

    Hope this helps.

Re: Finding average from numbers appearing in a file
by trizen (Hermit) on Aug 05, 2012 at 08:44 UTC
    use strict; use warnings; use Locale::Country; my @names = all_country_names(); # Better use Regexp::Trie my $countries_re = do { local $" = q{|}; qr/@{[map quotemeta, @names]}/i; }; my %hash; while (<>) { #if (/^.*?-([A-Z][A-Za-z\s\-]+[a-z])\s+(\d+)\s+hops?\b/is) { if (/^.*?-($countries_re)\s+(\d+)\s+hops?\b/is) { my ($country, $hops) = ($1, $2); $hash{$country} = { sum => ($hash{$country}{sum} || 0) + $hops, num => ++$hash{$country}{num} }; } } foreach my $key (sort keys %hash) { printf "$key avg: %.3f\n", $hash{$key}{sum} / $hash{$key}{num}; } __END__ Australia avg: 30.000 China avg: 22.667 Czech Republic avg: 19.500 France avg: 30.000 Germany avg: 8.500 Japan avg: 18.000 Netherlands avg: 9.500 United Kingdom avg: 8.000 United States avg: 19.667