Re^2: Multi-line Regex Performance

The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line (segment '01' in our parlance.)

There could be hash characters in some of the fields -- many of them are freeform and take addresses, comments, etc. There will not be any in the first position though, other than the ones that denote new records. Thanks sauoq.

Comment on Re^2: Multi-line Regex Performance

Replies are listed 'Best First'.
Re^3: Multi-line Regex Performance by sauoq (Abbot) on Nov 01, 2005 at 16:27 UTC
The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line This doesn't really tell us anything that the `substr()` hadn't already told us. The thing is, we can't reliably count whitespace in the data you provided. But anyway, since what you want is always on that first line, it's probably a lot easier (and more efficient than using a regular expression) to just read the data line by line ignoring lines that don't match `/^##/` and doing what you want with the ones that do. This would have the added benefit of not keeping 300+MB in RAM. `while (<>) { next unless /^##/; my $key = substr $_, 19, 10; do_stuff_with($key); }` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: Multi-line Regex Performance
by sauoq (Abbot) on Nov 01, 2005 at 16:27 UTC

The key for checking whether one of the multi-line records is a keeper or not is at position 19 for a length of ten on the first line

This doesn't really tell us anything that the substr() hadn't already told us. The thing is, we can't reliably count whitespace in the data you provided.

But anyway, since what you want is always on that first line, it's probably a lot easier (and more efficient than using a regular expression) to just read the data line by line ignoring lines that don't match /^##/ and doing what you want with the ones that do. This would have the added benefit of not keeping 300+MB in RAM.

while (<>) {  
  next unless /^##/; 
  my $key = substr $_, 19, 10;
  do_stuff_with($key);
}
[download]

-sauoq
"My two cents aren't worth a dime.";

[reply]
[d/l]
[select]