in reply to Multi-line Regex Performance

Can you tell us what is supposed to be the hash key in this input? It's impossible to tell by reading the code without knowing if there are tabs in your whitespace. And, are these fixed length records? (Or even a fixed number of lines?)

If there are no hashes in the records except the doubled ones, your regex could probably be as simple as /(##[^#]*)/g. (And it would be a lot faster.)

-sauoq
"My two cents aren't worth a dime.";

Replies are listed 'Best First'.
Re^2: Multi-line Regex Performance
by pboin (Deacon) on Nov 01, 2005 at 15:52 UTC

    The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line (segment '01' in our parlance.)

    There could be hash characters in some of the fields -- many of them are freeform and take addresses, comments, etc. There will not be any in the first position though, other than the ones that denote new records. Thanks sauoq.

      The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line

      This doesn't really tell us anything that the substr() hadn't already told us. The thing is, we can't reliably count whitespace in the data you provided.

      But anyway, since what you want is always on that first line, it's probably a lot easier (and more efficient than using a regular expression) to just read the data line by line ignoring lines that don't match /^##/ and doing what you want with the ones that do. This would have the added benefit of not keeping 300+MB in RAM.

      while (<>) { next unless /^##/; my $key = substr $_, 19, 10; do_stuff_with($key); }

      -sauoq
      "My two cents aren't worth a dime.";