Re: Adding data to hashes & comparing

Some quick observations about your code so far:

Unless you are using a very old perl, you don't need the & before function calls: logToHash($_) will do just fine.
You will find your debugging life much easier if you add use strict; use warnings; immediately after the shabang line.

As to your question about data structures and comparisons , perhaps you could be more specific about how you want to compare fields - are you counting the GET/POST requests for each user agent? each IP? each data? some combination of the three?

Your plans will affect your choice of data structure. For instance, if you want to count all HTTP requests with the same user agent, you will need an HoAoH hash where each key is a user agent and each value is an array containing all of request hashes sharing the same user agent.

You can build this hash most efficiently while you are reading in the data. Depending on your goals, you may be able to eliminate the AoH hash entirely and rely solely on the HoAoH hashes.

Best, beth

Comment on Re: Adding data to hashes & comparing Select or Download Code

Replies are listed 'Best First'.
Re^2: Adding data to hashes & comparing by hallikpapa (Scribe) on Mar 30, 2009 at 20:47 UTC
For instance, I want to compare IP's in the get and post arrays to see if there are any matches. Then I will look at the User Agent, to see if those are matching, and finally will also check the time stamps to see if they fall within a certain alloted time. Basically for a few days something wasn't being tracked, so I would like to go back and get the numerical data out of the GET request and associate it to the correct post request (which will have some user data in it). So I need to track timestamps, user agent, and the IP address to get as close as possible.	[reply]
Re^3: Adding data to hashes & comparing by ELISHEVA (Prior) on Mar 30, 2009 at 21:18 UTC
If I am understanding you correctly, you could store each record in a HoHoHoA. For each record you read in, you would need the following psuedo code: `my $IP = #extract from %data; my $userAgent = #extract from %data; my $date = #extract from %data; my $aRequests = $hRequests{$IP}{$userAgent}{$date}; push @$aRequests, \%data;` [download] Then after you have read in all the requests, loop through `%hRequests` using the hash keys to select the requests that interest you. The pseudo code would look something like this: `while (my ($IP, $hUserAgents) = each(%hRequests)) { next if #IP is boring; while (my ($userAgent, $hDates = each(%$hUserAgents)) { next if #user agent is boring; while (my ($date, $aRequests) = each(%$hDates)) { #do something if date is in range #wanted for $IP, $userAgent } } }` [download] There's a lot of work navigating references to hashes and arrays here. As moritz said previously, studying perldata, perlref (or perlreftut) and perldsc might be well worth your time. Best, beth	[reply] [d/l] [select]
Re^4: Adding data to hashes & comparing by hallikpapa (Scribe) on Mar 31, 2009 at 14:36 UTC
`my $aRequests = $hRequests{$IP}{$userAgent}{$date};` So this line would be making a "hash key" and pushing it into $aRequests `push @$aRequests, \%data;` And this line Stores the whole line from the apache log in that hash key reference that was just created? If that is correct, I am still getting errors and I don't believe it's storing the data correctly. Mostly because it's not sending back all the data, only one line from the log. It's been so long since I've done perl. :) I'll keep hacking away and continue looking over those links. P This is so simple, I know I have done something similar before! Anyways, the deal is I get an uninitialized value in hash element for every line. And when I try and print the contents of @get_array, it's only one line? It looks as though $ip, $userAgent, and $date aren't being populated/created at all foreach (@get_logs) { @get_array = &logToHash($_); } foreach(@get_array) { print Dumper($_); } sub logToHash { my $file = $_; my @AoH; open LOG, $file or die $!; our ($aRequests,$ip,$userAgent,$date,$hRequests,$host); while ( my $line_from_logfile = <LOG> ) { eval { %data = $lr->parse($line_from_logfile); }; if (%data) { # We have data to process while( my ($key, $value) = each(%data) ) { if($key =~ '%h') { ($host,$ip) = split(/:/, $value); } if($key =~ '%{User-Agent}i\""') { $userAgent = $value; } if($key =~ '%t') { $date = $value; } } $aRequests = $hRequests{$ip}{$userAgent}{$date}; push @$aRequests, \%data; } } return @$aRequests; } [download]	[reply] [d/l] [select]
Re^5: Adding data to hashes & comparing by ELISHEVA (Prior) on Mar 31, 2009 at 17:56 UTC
Re^3: Adding data to hashes & comparing by moritz (Cardinal) on Mar 30, 2009 at 21:10 UTC
For instance, I want to compare IP's in the get and post arrays to see if there are any matches Then it would make sense to store it in a hash of hashes, with the IP as the key. Then the searching for common IPs is as simple as iterating over the keys of the first hash, and look them op in the second hash (which doesn't require another iteration). See for example perlfaq4, "How can I get the unique keys from two hashes?" for inspiration.	[reply]