(Roll-up of answers to the responses thus far, with my appreciation!)


Re: hdb (Re: Recommendations for efficient data reduction/substitution application):
Yes, the application of the substitutions is the process to remove the noise. The records may include errors that have not been seen in production before, so no, there is not a method I am aware of to extract the data from the records instead of removing the noise.


Re: kennethk (Re: Recommendations for efficient data reduction/substitution application):
At that point, I have broken the record up into parts in a hash called %entry (which includes other things such as the host logging the message, the time stamp, etc.). While I would love to be able to pull the entire data set into memory and run through the 100+ regexes one time, the combined size of the logs to process (several have exceeded 5GB in size so far) discourages the attempt. (Unless there is another way that has not come to mind yet.)


Re: shmem (Re: Recommendations for efficient data reduction/substitution application):
In this case, it becomes "I don't care about this, this, or that, but I need the rest of it." I had not thought about study(), however. I will look into that.


Thank you all for your input and assistance.


In reply to Re: Recommendations for efficient data reduction/substitution application by atcroft
in thread Recommendations for efficient data reduction/substitution application by atcroft

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.