What is the best way to handle this data?

terminaljunkie has asked for the wisdom of the Perl Monks concerning the following question:

ok.. i am re-phrasing my question so that it is easier to understand :) - Hello all. I am new here at perlmonks. I am curious what the best way to handle this data would be. I have a log file that has information set up in the following way:

system1,data,variable1,data,data1,data1

Let's say for instance we are talking about used car dealerships. this would be the data:

lot1,new,honda,civic,cincinnati,oh
lot1,used,chevy,impala,cincinnati,oh
lot1,new,honda,civic,cincinnati,oh
lot1,used,chevy,impala,cincinnati,oh
lot1,new,cadillac,escalade,cincinnati,oh
lot2,new,buick,sentry,houston,tx
lot2,used,ford,ranger,houston,tx
lot2,new,buick,sentry,houston,tx
lot2,used,ford,ranger,houston,tx
lot2,used,ford,ranger,houston,tx
lot3,new,ford,ranger,lexington,ky
lot3,used,cadillac,escalade,lexington,ky
lot3,used,cadillac,escalade,lexington,ky
lot4,new,ford,f150,chicago,illinois
lot4,new,ford,f150,chicago,illinois
lot4,new,ford,f150,chicago,illinois

as you can see there are different lots. each lot has different makes and models, but each lot has a unique city/state. I dont care if they are new or used. and i dont care about the model. I need to know what lots have how many different makes, and the city/state of that lot. so, the output might be...

lot1,cincinnati,oh has 2 hondas, 2 chevys, 1 cadillac
lot2,houston,tx has 2 buicks, 3 fords
lot3,lexington,ky has 1 ford, 2 cadillacs
etc...

Hopefully my description is thorough. I am not sure exactly the best way to approach this, hence my question. I hope that someone will be able to shed some light on this for me. Thanks in advance!

Comment on What is the best way to handle this data?

Replies are listed 'Best First'.
Re: What is the best way to handle this data? by kennethk (Abbot) on Apr 28, 2009 at 15:48 UTC
In addition to toolic's advice, I'd point out you'll likely want to use a module such as Text::CSV to read and parse your file.	[reply]
Re: What is the best way to handle this data? by toolic (Bishop) on Apr 28, 2009 at 15:27 UTC
You could load your data into a Perl data structure, such as a HASHES OF HASHES to get a count of all variables: `$data{$system}{$variable}++` [download] It's not clear to me what you want to do with your data elements. If you provide a small example of your actual input and expected output, then I can try to provide more of a concrete code sample.	[reply] [d/l]
Re^2: What is the best way to handle this data? by terminaljunkie (Initiate) on Apr 28, 2009 at 16:41 UTC
Thanks for the reply. I am trying to think of a way to describe the data. Let's say for instance we are talking about used car dealerships. this would be the data: lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,cadillac,escalade,cincinnati,oh lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,used,ford,ranger,houston,tx lot3,new,ford,ranger,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois as you can see there are different lots. each lot has different makes and models, but each lot has a unique city/state. I dont care if they are new or used. and i dont care about the model. I need to know what lots have how many different makes, and the city/state of that lot. so, the output might be... lot1,cincinnati,oh has 2 hondas, 2 chevys, 1 cadillac lot2,houston,tx has 2 buicks, 3 fords lot3,lexington,ky has 1 ford, 2 cadillacs etc... that is as best as I can explain it. I hope that helps.. thanks!	[reply]
Re^3: What is the best way to handle this data? by toolic (Bishop) on Apr 28, 2009 at 17:09 UTC
Your input is simple enough that you could parse it using split, but for anything more complex, you should heed kennethk's advice. Here is the parsing piece; I'll leave the printout as an exercise for you: use strict; use warnings; use Data::Dumper; my %data; while (<DATA>) { chomp; my ($lot, undef, $make, undef, $city, $state) = split /,/; $data{$lot}{$make}++; $data{$lot}{location} = "$city,$state"; } print Dumper(\%data); __DATA__ lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,honda,civic,cincinnati,oh lot1,used,chevy,impala,cincinnati,oh lot1,new,cadillac,escalade,cincinnati,oh lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,new,buick,sentry,houston,tx lot2,used,ford,ranger,houston,tx lot2,used,ford,ranger,houston,tx lot3,new,ford,ranger,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot3,used,cadillac,escalade,lexington,ky lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois lot4,new,ford,f150,chicago,illinois [download] which prints out; `$VAR1 = { 'lot3' => { 'location' => 'lexington,ky', 'ford' => 1, 'cadillac' => 2 }, 'lot1' => { 'location' => 'cincinnati,oh', 'cadillac' => 1, 'chevy' => 2, 'honda' => 2 }, 'lot2' => { 'location' => 'houston,tx', 'ford' => 3, 'buick' => 2 }, 'lot4' => { 'location' => 'chicago,illinois', 'ford' => 3 } };` [download]	[reply] [d/l] [select]
Re^4: What is the best way to handle this data? by terminaljunkie (Initiate) on Apr 28, 2009 at 19:16 UTC
Re^5: What is the best way to handle this data? by toolic (Bishop) on Apr 28, 2009 at 20:08 UTC
Re: What is the best way to handle this data? by leocharre (Priest) on Apr 28, 2009 at 17:05 UTC
If you know sql, you can store this into a sqlite db- or as mentioned a csv- and run sql queries on it! :-)	[reply]
Re: What is the best way to handle this data? by spx2 (Deacon) on Apr 29, 2009 at 14:14 UTC
I think the best solution to your question is storing your data in a HoH(if you don't have alot of data). Otherwise use the HoH as a 'buffer' and then store the data in a db and after that you can efficiently process it.	[reply]