terminaljunkie has asked for the wisdom of the Perl Monks concerning the following question:

ok.. i am re-phrasing my question so that it is easier to understand :) - Hello all. I am new here at perlmonks. I am curious what the best way to handle this data would be. I have a log file that has information set up in the following way:

system1,data,variable1,data,data1,data1

Let's say for instance we are talking about used car dealerships. this would be the data:

lot1,new,honda,civic,cincinnati,oh
lot1,used,chevy,impala,cincinnati,oh
lot1,new,honda,civic,cincinnati,oh
lot1,used,chevy,impala,cincinnati,oh
lot1,new,cadillac,escalade,cincinnati,oh
lot2,new,buick,sentry,houston,tx
lot2,used,ford,ranger,houston,tx
lot2,new,buick,sentry,houston,tx
lot2,used,ford,ranger,houston,tx
lot2,used,ford,ranger,houston,tx
lot3,new,ford,ranger,lexington,ky
lot3,used,cadillac,escalade,lexington,ky
lot3,used,cadillac,escalade,lexington,ky
lot4,new,ford,f150,chicago,illinois
lot4,new,ford,f150,chicago,illinois
lot4,new,ford,f150,chicago,illinois

as you can see there are different lots. each lot has different makes and models, but each lot has a unique city/state. I dont care if they are new or used. and i dont care about the model. I need to know what lots have how many different makes, and the city/state of that lot. so, the output might be...

lot1,cincinnati,oh has 2 hondas, 2 chevys, 1 cadillac
lot2,houston,tx has 2 buicks, 3 fords
lot3,lexington,ky has 1 ford, 2 cadillacs
etc...

Hopefully my description is thorough. I am not sure exactly the best way to approach this, hence my question. I hope that someone will be able to shed some light on this for me. Thanks in advance!
  • Comment on What is the best way to handle this data?

Replies are listed 'Best First'.
Re: What is the best way to handle this data?
by kennethk (Abbot) on Apr 28, 2009 at 15:48 UTC
    In addition to toolic's advice, I'd point out you'll likely want to use a module such as Text::CSV to read and parse your file.
Re: What is the best way to handle this data?
by toolic (Bishop) on Apr 28, 2009 at 15:27 UTC
    You could load your data into a Perl data structure, such as a HASHES OF HASHES to get a count of all variables:
    $data{$system}{$variable}++

    It's not clear to me what you want to do with your data elements. If you provide a small example of your actual input and expected output, then I can try to provide more of a concrete code sample.

      Thanks for the reply. I am trying to think of a way to describe the data. Let's say for instance we are talking about used car dealerships. this would be the data:

      lot1,new,honda,civic,cincinnati,oh
      lot1,used,chevy,impala,cincinnati,oh
      lot1,new,honda,civic,cincinnati,oh
      lot1,used,chevy,impala,cincinnati,oh
      lot1,new,cadillac,escalade,cincinnati,oh
      lot2,new,buick,sentry,houston,tx
      lot2,used,ford,ranger,houston,tx
      lot2,new,buick,sentry,houston,tx
      lot2,used,ford,ranger,houston,tx
      lot2,used,ford,ranger,houston,tx
      lot3,new,ford,ranger,lexington,ky
      lot3,used,cadillac,escalade,lexington,ky
      lot3,used,cadillac,escalade,lexington,ky
      lot4,new,ford,f150,chicago,illinois
      lot4,new,ford,f150,chicago,illinois
      lot4,new,ford,f150,chicago,illinois
      as you can see there are different lots. each lot has different makes and models, but each lot has a unique city/state. I dont care if they are new or used. and i dont care about the model. I need to know what lots have how many different makes, and the city/state of that lot. so, the output might be...

      lot1,cincinnati,oh has 2 hondas, 2 chevys, 1 cadillac
      lot2,houston,tx has 2 buicks, 3 fords
      lot3,lexington,ky has 1 ford, 2 cadillacs
      etc...

      that is as best as I can explain it. I hope that helps.. thanks!
        Your input is simple enough that you could parse it using split, but for anything more complex, you should heed kennethk's advice. Here is the parsing piece; I'll leave the printout as an exercise for you:
        use strict; use warnings; use Data::Dumper; my %data; while (<DATA>) { chomp; my ($lot, undef, $make, undef, $city, $state) = split /,/; $data{$lot}{$make}++; $data{$lot}{location} = "$city,$state"; } print Dumper(\%data); __DATA__ lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,cadillac,escalade,cincinnati,oh lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,used,ford,ranger,houston,tx lot3,new,ford,ranger,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois

        which prints out;

        $VAR1 = { 'lot3' => { 'location' => 'lexington,ky', 'ford' => 1, 'cadillac' => 2 }, 'lot1' => { 'location' => 'cincinnati,oh', 'cadillac' => 1, 'chevy' => 2, 'honda' => 2 }, 'lot2' => { 'location' => 'houston,tx', 'ford' => 3, 'buick' => 2 }, 'lot4' => { 'location' => 'chicago,illinois', 'ford' => 3 } };
Re: What is the best way to handle this data?
by leocharre (Priest) on Apr 28, 2009 at 17:05 UTC
Re: What is the best way to handle this data?
by spx2 (Deacon) on Apr 29, 2009 at 14:14 UTC

    I think the best solution to your question is storing your data in a HoH(if you don't have alot of data). Otherwise use the HoH as a 'buffer' and then store the data in a db and after that you can efficiently process it.