Re: Multi-line Regex Performance

Can you tell us what is supposed to be the hash key in this input? It's impossible to tell by reading the code without knowing if there are tabs in your whitespace. And, are these fixed length records? (Or even a fixed number of lines?)

If there are no hashes in the records except the doubled ones, your regex could probably be as simple as /(##[^#]*)/g. (And it would be a lot faster.)

-sauoq
"My two cents aren't worth a dime.";

Comment on Re: Multi-line Regex Performance Download Code

Replies are listed 'Best First'.
Re^2: Multi-line Regex Performance by pboin (Deacon) on Nov 01, 2005 at 15:52 UTC
The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line (segment '01' in our parlance.) There could be hash characters in some of the fields -- many of them are freeform and take addresses, comments, etc. There will not be any in the first position though, other than the ones that denote new records. Thanks sauoq.	[reply]
Re^3: Multi-line Regex Performance by sauoq (Abbot) on Nov 01, 2005 at 16:27 UTC
The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line This doesn't really tell us anything that the `substr()` hadn't already told us. The thing is, we can't reliably count whitespace in the data you provided. But anyway, since what you want is always on that first line, it's probably a lot easier (and more efficient than using a regular expression) to just read the data line by line ignoring lines that don't match `/^##/` and doing what you want with the ones that do. This would have the added benefit of not keeping 300+MB in RAM. `while (<>) { next unless /^##/; my $key = substr $_, 19, 10; do_stuff_with($key); }` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]