in reply to Recommendations for efficient data reduction/substitution application
(Roll-up of answers to the responses thus far, with my appreciation!)
Re: hdb (Re: Recommendations for efficient data reduction/substitution application):
Yes, the application of the substitutions is the process to remove the noise. The records may include errors that have not been seen in production before, so no, there is not a method I am aware of to extract the data from the records instead of removing the noise.
Re: kennethk (Re: Recommendations for efficient data reduction/substitution application):
At that point, I have broken the record up into parts in a hash called %entry (which includes other things such as the host logging the message, the time stamp, etc.). While I would love to be able to pull the entire data set into memory and run through the 100+ regexes one time, the combined size of the logs to process (several have exceeded 5GB in size so far) discourages the attempt. (Unless there is another way that has not come to mind yet.)
Re: shmem (Re: Recommendations for efficient data reduction/substitution application):
In this case, it becomes "I don't care about this, this, or that, but I need the rest of it." I had not thought about study(), however. I will look into that.
Thank you all for your input and assistance.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Recommendations for efficient data reduction/substitution application
by kennethk (Abbot) on Mar 03, 2015 at 21:20 UTC |