Hashes and Arrays - Selecting a memory structure

tdudgeon has asked for the wisdom of the Perl Monks concerning the following question:

I have spent days reading and testing hashes and arrays and would appreciate if someone could please just point me in the right direction.

I am writing a script that will process multiple log files to look for a maximum value in each. Once found, I need to compare that max value found in each log file to it's corresponding max value stored in a CSV file.

My question:

What memory structure would be best suited to storing this data ? (Array of Hashes, Hash of Hashes etc) (Maximum Log File)

DOW,Sep,18,09:31:16,440,29,142,10148,4

Russell2000,Sep,18,09:31:16,440,29,142,10148,4

RussellComposite,Sep,18,09:31:16,440,29,142,10148,4

Russell1000,Sep,18,09:31:16,440,29,142,10148,4

SP500,Sep,18,09:31:16,440,29,142,10148,4

Additional info:

1. The process files provide the category (ie DOW, SP500 etc) and I will be comparing the processed max value against the 7th element of the CSV file for the specific category.

2. If the maximum processed value is greater than the value currently stored in the CSV file, then I need to ultimately update the row in the CSV file.

3. When all file processing is complete, I need to sort the memory structure in ascending order on the 7th element then output back to a CSV file.

Please let me know if additional info is needed.

Thanks

Comment on Hashes and Arrays - Selecting a memory structure

Replies are listed 'Best First'.
Re: Hashes and Arrays - Selecting a memory structure by GrandFather (Saint) on Sep 20, 2007 at 05:51 UTC
I'd go for a HoH, although a HoA would work. Consider: use strict; use warnings; use Text::CSV; my %lookup; my $csv = Text::CSV->new (); my $filename = 'DATA'; while (<DATA>) { next unless $csv->parse ($_); my @row = $csv->fields (); next unless @row >= 7; my $key = $row[0]; next if exists $lookup{$key} && $row[7] <= $lookup{$key}{max}; $lookup{$key} = {max => $row[7], line => $., file => $filename}; } print "Following maximums found:\n"; print "$_ = $lookup{$_}{max} in file $lookup{$_}{file}, line $lookup{$ +_}{line}\n" for sort keys %lookup; __DATA__ DOW,Sep,18,09:31:16,440,29,142,10148,4 Russell2000,Sep,18,09:31:16,440,29,142,10148,4 RussellComposite,Sep,18,09:31:16,440,29,142,10148,4 Russell1000,Sep,18,09:31:16,440,29,142,10148,4 SP500,Sep,18,09:31:16,440,29,142,10148,4 Russell2000,Sep,18,09:31:16,440,29,142,10120,4 SP500,Sep,18,09:31:16,440,29,142,10160,4 [download] Prints: `Following maximums found: DOW = 10148 in file DATA, line 1 Russell1000 = 10148 in file DATA, line 4 Russell2000 = 10148 in file DATA, line 2 RussellComposite = 10148 in file DATA, line 3 SP500 = 10160 in file DATA, line 7` [download] Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re^2: Hashes and Arrays - Selecting a memory structure by tdudgeon (Initiate) on Sep 20, 2007 at 18:01 UTC
My apology for being a bit unclear on the requirements.. I'm better with examples: I need to process multiple log files containing thousands of 'DOW' related records each. (same data structure as shown in my original post). My goal is to determine if any record in the new log files contain a value in the 7th element that is greater than the current maximum value stored in the 'DOW' record of an index maximum CSV file. That CSV file will contain only 1 record per financial index. (ie DOW, SP500)	[reply]
Re^3: Hashes and Arrays - Selecting a memory structure by GrandFather (Saint) on Sep 20, 2007 at 19:23 UTC
The while loop doesn't care if the lines come from one file or many, but the sample code is much more concise if I use a single 'file' with duplicated lines. You could: `@ARGV = @filenames; while (<>) { $filename = shift @filenames if $. == 1; ... }` [download] to process a swag of files. Perl is environmentally friendly - it saves trees	[reply] [d/l]
Re: Hashes and Arrays - Selecting a memory structure by hangon (Deacon) on Sep 20, 2007 at 06:14 UTC
You may want to consider dumping the logs into a database and manipulating the data there. Then you can export what you need back into a CSV file. This would give you much more flexibility if your requirements change later on. Look at MySql, SQLite or others. The DBI/DBD modules can interface a large number of databases. Of course this would mean learning some SQL, but the basics are not difficult. You may also want to look at DBD::CSV which appears to let you treat CSV files as a database, and manipulate them using SQL statements through DBI (disclaimer: I have not tried this module myself).	[reply]
Re: Hashes and Arrays - Selecting a memory structure by bruceb3 (Pilgrim) on Sep 20, 2007 at 05:54 UTC
Do you think that you will ever have a need to do more then just find the highest value for each index? Basically you want to know when an index is making new highes, right? I ask because I have similar requirements for equities traded locally (I live in Australia). My solution was to process each log file and store the daily data for each stock in a separate file. I have code that uses WWW::Mechanize and friends, to log into my broker's web site and dl the end of day data. This file is then used to update the data for each stock. In the code that I use to try and find a profitable way to trade options, I load the stock into a hash of arrays. The keys to the hash is date, open, high, low, close and volume. I have actually wrapped the hash up in a class for convenience and to handle any possible changes to the data that might cause a change to the implementation. This method has worked well for me over a number of years and I would recommend doing something similar.	[reply]
Re: Hashes and Arrays - Selecting a memory structure by philc (Acolyte) on Sep 20, 2007 at 12:26 UTC
My 2 cents.....I'm barely a Monk's novice's assitant helper (in perl terms)...but I would try a hash with an anonymous array...since each item has a unique key value (DOW, SP500, Russel2000) and discrete values. The anonymous array allows you to associate multiple values with a single hash key. It's also more intiutive.<\p> Accessing the 7th value is straight forward from there. The hash can be sorted based on the 7th value.<\p> `#Begin code example my %hash = ( 'Dow' , [99,123,345,567,678, 756, 202], 'SP500', [88,55,55,677,843, 987, 345], 'Russell1000',[456,788,990,889,876,234,456], ); #access 7th value print $hash{'Dow'}[7],"\n"; #should equal 202` [download]	[reply] [d/l]