in reply to Adding data to hashes & comparing

Some quick observations about your code so far:

As to your question about data structures and comparisons , perhaps you could be more specific about how you want to compare fields - are you counting the GET/POST requests for each user agent? each IP? each data? some combination of the three?

Your plans will affect your choice of data structure. For instance, if you want to count all HTTP requests with the same user agent, you will need an HoAoH hash where each key is a user agent and each value is an array containing all of request hashes sharing the same user agent.

You can build this hash most efficiently while you are reading in the data. Depending on your goals, you may be able to eliminate the AoH hash entirely and rely solely on the HoAoH hashes.

Best, beth

Replies are listed 'Best First'.
Re^2: Adding data to hashes & comparing
by hallikpapa (Scribe) on Mar 30, 2009 at 20:47 UTC
    For instance, I want to compare IP's in the get and post arrays to see if there are any matches.

    Then I will look at the User Agent, to see if those are matching, and finally will also check the time stamps to see if they fall within a certain alloted time.

    Basically for a few days something wasn't being tracked, so I would like to go back and get the numerical data out of the GET request and associate it to the correct post request (which will have some user data in it). So I need to track timestamps, user agent, and the IP address to get as close as possible.
      If I am understanding you correctly, you could store each record in a HoHoHoA. For each record you read in, you would need the following psuedo code:
      my $IP = #extract from %data; my $userAgent = #extract from %data; my $date = #extract from %data; my $aRequests = $hRequests{$IP}{$userAgent}{$date}; push @$aRequests, \%data;

      Then after you have read in all the requests, loop through %hRequests using the hash keys to select the requests that interest you. The pseudo code would look something like this:

      while (my ($IP, $hUserAgents) = each(%hRequests)) { next if #IP is boring; while (my ($userAgent, $hDates = each(%$hUserAgents)) { next if #user agent is boring; while (my ($date, $aRequests) = each(%$hDates)) { #do something if date is in range #wanted for $IP, $userAgent } } }

      There's a lot of work navigating references to hashes and arrays here. As moritz said previously, studying perldata, perlref (or perlreftut) and perldsc might be well worth your time.

      Best, beth

         my $aRequests = $hRequests{$IP}{$userAgent}{$date};

        So this line would be making a "hash key" and pushing it into $aRequests

        push @$aRequests, \%data;

        And this line Stores the whole line from the apache log in that hash key reference that was just created?

        If that is correct, I am still getting errors and I don't believe it's storing the data correctly. Mostly because it's not sending back all the data, only one line from the log. It's been so long since I've done perl. :) I'll keep hacking away and continue looking over those links. P

        This is so simple, I know I have done something similar before!

        Anyways, the deal is I get an uninitialized value in hash element for every line. And when I try and print the contents of @get_array, it's only one line? It looks as though $ip, $userAgent, and $date aren't being populated/created at all
        foreach (@get_logs) { @get_array = &logToHash($_); } foreach(@get_array) { print Dumper($_); } sub logToHash { my $file = $_; my @AoH; open LOG, $file or die $!; our ($aRequests,$ip,$userAgent,$date,$hRequests,$host); while ( my $line_from_logfile = <LOG> ) { eval { %data = $lr->parse($line_from_logfile); }; if (%data) { # We have data to process while( my ($key, $value) = each(%data) ) { if($key =~ '%h') { ($host,$ip) = split(/:/, $value); } if($key =~ '%{User-Agent}i\""') { $userAgent = $value; } if($key =~ '%t') { $date = $value; } } $aRequests = $hRequests{$ip}{$userAgent}{$date}; push @$aRequests, \%data; } } return @$aRequests; }
      For instance, I want to compare IP's in the get and post arrays to see if there are any matches

      Then it would make sense to store it in a hash of hashes, with the IP as the key. Then the searching for common IPs is as simple as iterating over the keys of the first hash, and look them op in the second hash (which doesn't require another iteration). See for example perlfaq4, "How can I get the unique keys from two hashes?" for inspiration.