How can i catch strings matching a regex across multiple lines?

babelfish has asked for the wisdom of the Perl Monks concerning the following question:

Dear fellow monks,

i have a problem collecting data from a record-oriented stream.

In particular, i need to collect strings within each paragraph appearing on different lines and matching a common regex. But my current approach does not work properly, only the first occurrence of the regex is found and the other ones are skipped or ignored.

The data stream i want to process looks like this:

### HEADING OF RECORD 1 ####
Logical device ID=08E1
LINE_THAT_DOES_NOT_BOTHER_ME
ANOTHER_LINE_THAT_DOES_NOT_BOTHER_ME
29 8/0/2/1/0.18.152.0.0.6.1  c29t6d1   FA  5eA
30 8/0/3/1/0.17.152.0.0.6.1  c30t6d1   FA 12e
31 8/0/8/1/0.17.150.0.0.6.1  c31t6d1   FA 10eA
32 8/0/9/1/0.18.150.0.0.6.1  c32t6d1   FA 11eA

### HEADING OF RECORD 2 ####
Logical device ID=08E2
LINE_THAT_DOES_NOT_BOTHER_ME
ANOTHER_LINE_THAT_DOES_NOT_BOTHER_ME
29 8/0/2/1/0.18.152.0.0.4.1  c29t4d1   FA  5eA
30 8/0/3/1/0.17.152.0.0.4.1  c30t4d1   FA 12eA
31 8/0/8/1/0.17.150.0.0.4.1  c31t4d1   FA 10eA
32 8/0/9/1/0.18.150.0.0.4.1  c32t4d1   FA 11eA

### HEADING OF RECORD 3 ####
(...)
[download]

The task is as follows:

Create a hash of arrays from that stream where the "Logical device ID" numbers are the keys and the cXtYdZ strings shall be collected in arrays, being the respective values:

%hash = (
'08E1' => ['c29t6d1','c30t6d1','c31t6d1','c32t6d1'],
'08E2' => ['c29t4d1','c30t4d1','c31t41','c32t4d1'],
(...)
)
[download]

I am using this code for processing the stuff:

use strict;
use warnings;
use Data::Dumper;

my %hash;
open ( FH, "powermt display dev=all|");# data stream comes from here

$/ = '';
while (<FH>) {
    my ($id) = ( $_ =~ /Logical device ID=(\w+)/ );
    push (@{$hash{$id}}, $1) if /(c\d+t\d+d\d+)/;
}

print Dumper (\%hash);
[download]

But when using this code, i only get a HoA containing only the first occurence of the regex within each paragraph, like this:

$VAR1 = {
          '08E1' => [
                      'c29t6d1'
                    ],
          '08E2' => [
                      'c29t4d1'
                    ],
(...)
[download]

So far, the record processing itself seems to work ok, but i am missing something in the while loop when trying to catch all cXtYdZ strings. I also must mention that the number of lines with that string may also vary, there might be just one line, but there could also be 2,3,4,5 ... another lines containing these strings.

The problem seems to be that i need to execute the push statement as often as the regex pattern appears within each loop.

Can somebody enlighten me for perhaps improving my loop-control skills?

TIA!

Comment on How can i catch strings matching a regex across multiple lines? Select or Download Code

Replies are listed 'Best First'.
Re: How can i catch strings matching a regex across multiple lines? by aaron_baugher (Curate) on Jun 30, 2012 at 23:38 UTC
The problem is this line: `push (@{$hash{$id}}, $1) if /(c\d+t\d+d\d+)/;` That checks the input record for your pattern, captures it, and pushes it onto the array pointed to by that ID. But it only does that once, so it finds the first one, pushes it, and moves on. To find them all, you'll need to tell your regex to repeat the search: `push @{hash{$id}}, /(c\d+t\d+d\d+)/g;` [download] Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]
Re: How can i catch strings matching a regex across multiple lines? by Anonymous Monk on Jun 30, 2012 at 21:38 UTC
The problem is this line `my ($id) = ( $_ =~ /Logical device ID=(\w+)/ );` [download] If you add `use Data::Dump; dd { line => $_, id => $id };` [download] right afterwards, and run your program, you can see that $id gets reinitialized upon each iteration of the loop (with each new line read) You want this `my $id; while.... { $id = $1 if /Logical device ID=(\w+)/ ; dd { line => $_, id => $id }; }` [download] See Tutorials: Variable Scoping in Perl: the basics, Coping with Scoping , Mini-Tutorial: Perl's Memory Management, Lexical scoping like a fox, Basic debugging checklist , brian's Guide to Solving Any Perl Problem Also, see Parse::Report - parse Perl format-ed reports.	[reply] [d/l] [select]
Re^2: How can i catch strings matching a regex across multiple lines? by CountZero (Bishop) on Jul 01, 2012 at 11:15 UTC
you can see that $id gets reinitialized upon each iteration of the loop (with each new line read) But he is reading records, not lines and therefore it is perfectly OK --even recommended-- to re-initialize `$id` each time through the loop. The solution is to add the `g` modifier to the regex. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
Re^2: How can i catch strings matching a regex across multiple lines? by NetWallah (Canon) on Jun 30, 2012 at 23:14 UTC
I dont think the problem has anything to do with $id. The much bigger potential perpetrator is the "$/" setting. Since you expect to be reading lines, you need a record separator - otherwise, the entire stream will be "slurp"ed, and you will see only one record, which matches your symptoms. I hope life isn't a big joke, because I don't get it. -SNL	[reply]
Re^3: How can i catch strings matching a regex across multiple lines? by aaron_baugher (Curate) on Jun 30, 2012 at 23:42 UTC
Not quite. See perlvar, "INPUT_RECORD_SEPARATOR": setting $/ to an empty string sets the input record separator to "two or more consecutive empty lines." That's what he wants here, since his records are separated by a blank line. Aaron B. Available for small or large Perl jobs; see my home node.	[reply]
Re: How can i catch strings matching a regex across multiple lines? by 2teez (Vicar) on Jul 01, 2012 at 11:25 UTC
You can achieve what you want like so: use warnings; use strict; use Data::Dumper; my $device_id = {}; my $id = ""; while (<DATA>) { chomp; if (m/Logical.+=(.+?)$/) { $id = $1; } else { if (m/.+?\s+?(c.+?)\s+?.+?$/) { push @{ $device_id->{$id} }, $1; } } } print Dumper($device_id); __DATA__ ### HEADING OF RECORD 1 #### Logical device ID=08E1 LINE_THAT_DOES_NOT_BOTHER_ME ANOTHER_LINE_THAT_DOES_NOT_BOTHER_ME 29 8/0/2/1/0.18.152.0.0.6.1 c29t6d1 FA 5eA 30 8/0/3/1/0.17.152.0.0.6.1 c30t6d1 FA 12e 31 8/0/8/1/0.17.150.0.0.6.1 c31t6d1 FA 10eA 32 8/0/9/1/0.18.150.0.0.6.1 c32t6d1 FA 11eA ### HEADING OF RECORD 2 #### Logical device ID=08E2 LINE_THAT_DOES_NOT_BOTHER_ME ANOTHER_LINE_THAT_DOES_NOT_BOTHER_ME 29 8/0/2/1/0.18.152.0.0.4.1 c29t4d1 FA 5eA 30 8/0/3/1/0.17.152.0.0.4.1 c30t4d1 FA 12eA 31 8/0/8/1/0.17.150.0.0.4.1 c31t4d1 FA 10eA 32 8/0/9/1/0.18.150.0.0.4.1 c32t4d1 FA 11eA ### HEADING OF RECORD 3 #### (...) [download] `output: $VAR1 = { '08E2' => [ 'c29t4d1', 'c30t4d1', 'c31t4d1', 'c32t4d1' ], '08E1' => [ 'c29t6d1', 'c30t6d1', 'c31t6d1', 'c32t6d1' ] };` [download] Please, note test your match regexes. Also Check perldoc perldsc Hope this helps	[reply] [d/l] [select]
Re^2: How can i catch strings matching a regex across multiple lines? by ww (Archbishop) on Jul 01, 2012 at 13:20 UTC
/me .oO ' ... at last -- a tested solution!' + +	[reply]
Re^2: How can i catch strings matching a regex across multiple lines? by babelfish (Initiate) on Jul 01, 2012 at 20:11 UTC
Thanks, that one works well for me! So, i need to give myself an hour's detention on proper data munging :)	[reply]