Re: Algorithm To Select Lines Based On Attributes

Some quick thoughts:

#1: Use Devel::NYTProf or another profiler to see what the actual hot-spots in your code are!
Read lines from disk one at a time rather than slurping into @lines
Consider defining rules as subroutines acting on an argument and then use Memoize to cache results (assuming attributes re-occur frequently)
If you're re-running this against the same set of lines and rules frequently, cache the rule test results in a file or DB so you have DEFECTID and a list of rules it matches.
Perhaps reorganize the rules (if you can): $hash->{RULETYPE}->{RULENUMBER} = value. Then iterate the list of rules for each attribute, rather than (as you have it), iterating the attributes for each rule. I think that saves a lot of if ( defined $rulelist->{$rulenum}->{REGION} ) comparisons.

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Comment on Re: Algorithm To Select Lines Based On Attributes Select or Download Code

Replies are listed 'Best First'.
Re^2: Algorithm To Select Lines Based On Attributes by ~~David~~ (Hermit) on Jan 15, 2009 at 18:06 UTC
Thanks for the suggestions. I have one question about bullet #2: I need to read to the end of the file before I enter this subroutine because I need to know how some information at the bottom of the file before I decide which rule set to use. I figured it would be better to cache that DEFECTLIST into memory rather than re-reading the file again. Is that best? Or, is there someway I could store the position in the file of the beginning and the end of the defect list, and always ensure that all characters between it are the DEFECTLIST? I don't have experience with stuff like that... I will definately think about using Memoize and see if I can implement it. Thanks again.	[reply]
Re^3: Algorithm To Select Lines Based On Attributes by xdg (Monsignor) on Jan 15, 2009 at 18:36 UTC
I need to know how some information at the bottom of the file before I decide which rule set to use Maybe you can use File::ReadBackwards to find the information you need, then jump back to the start of the file and read forwards. If memory isn't an issue, then it may not matter, but anytime I see a file that large being slurped for a linear scan, I wonder if it could be done line by line instead. -xdg Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.	[reply]


Keep It Simple, Stupid
	PerlMonks