Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??

They need to add a trophy or star function for superior posts like yours. Upvoting seems poor recompense for the amount of effort you put into that. Thank you--and you correctly discerned some of my failings.

Yes, I was intending to remove leading and trailing whitespace. The reason for this is that a space on either side will throw off the matching for searches in which the user specified that the match must occur at the beginning or at the end. So the space removal is for the benefit of the regex later.

And, yes, each array processed line-by-line will actually have between 4 MB (at the lowest end) and around 250+ MB for one particular annotated version (with full HTML mouseovers, etc.); but the average being closer to 10 MB each. So you were correct that each one is over a MB. These are all coming from a database, each file represented in a separate table in the DB. The routine which feeds the array pulls every row of the table at once, and this is done to speed up the database portion, by not having to use 30,000+ calls to the DB, one per row, and it was also my understanding that it was less expensive, time-wise, to use some RAM than to make repeated I/O calls. I may be mistaken--you seem to have a good grasp of these things, so feel free to clarify.

The clients have two options for forming their query--and these options are individually available on a per-column basis: 1) they can use a simple, standard search, entering a keyword or phrase of their choice, then ticking checkboxes for case-sensitivity, whole-word (\bwhole-word\b) searching, must match at beginning or end, etc.; and 2) they can tick the "Use PERL regex" option which then disables all the other options and they are on their own with specifying what they want to match via formulation of their own regular expression. The subroutine I call for returning the regex handles both alternatives, returning in qr// form.

I will try out your code when I have a chance--probably won't be for another couple of days until my next window of opportunity. I very much appreciate your effort.

By the way, I didn't see much, if any, improvement with the addition of the "o" (m//o) for matching. I think this might be because the $regex is already in qr// form--but perhaps I'm simply not aware of how that affects things.

P.S. Oh, and by the way, I'm developing on Perl 5.12.4.



In reply to Re^2: Efficient regex search on array table by Polyglot
in thread Efficient regex search on array table by Polyglot

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.