saty has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks, I have a small problem with multi-line regexs, I had posted this question before but the code I posted was incomplete. I am performing this multi-line regex on a C map file. Here is the perl code
#!/usr/local/bin/perl -w # analyse_map.perl # Generate a snapshot of audio registers using register log file, # Open map files (map_merc and rom.dld) open(MAP, "<map_merc") || die "Unable to open map_merc"; # Read map_merc : while(<MAP>) { if( ($titi, $toto) = /^\.(\w+).+$\^\t+\.\w+\W+(\w+)/m) { print($titi, $toto, " ", $., "\n"); } } close(MAP);
The target I am trying to scan for data is given below (just a dew lines as the file is too big)
output input virtual section section address size file .begin 00002000 00000068 .begin a0000000 00000068 begin_flash.o _START a0000000 00000000 .copy_flash 00000000 00001488 .copy_flash a0001000 000001ac libbsp_qt1_aspic32.a[copy +_flash.o] copy_from_flash a0001000 00000000 .copy_flash a00011ac 00000978 libbsp_qt1_aspic32.a[cach +3940.o] SysDisableDCache a00011ac 00000000 disableD a0001220 00000000 SysDisableICache a0001294 00000000 disableI a0001308 00000000 SysEnableDCache a000137c 00000000 enableD a00013f8 00000000 SysEnableICache a0001464 00000000 .text 00003000 00052be4 .boot 80003000 0000009c libbsp_qt1_aspic32.a[boot +.o] HdwInit 80003000 00000000 end_SIF 80003080 00000000 .endtext 80055be4 00000000 .endtext 80055be4 00000000 end.o TextEnd 80055be4 00000000 .data 00180000 000004d0 .data 80201000 00000000 begin_flash.o StartData 80201000 00000000
The thing I am trying to do is scan the line that begins with "." eg .begin etc and put it in the scalar $titi and then if there is a line just below which starts with a tab and then ".", get the second part which is the address. I hope this explaination is clear enough. If you need more explaination mail me at satyajeet.navalkar@philips.com Thanx. Saty

Replies are listed 'Best First'.
RE: Multi-line regex
by Corion (Patriarch) on Aug 28, 2000 at 12:32 UTC

    Please use <CODE> to start your code and </CODE> to end your code postings.

    Your regex in question is :

    /^\.(\w+).+$\^\t+\.\w+\W+(\w+)/m
    This looks like a regular expression that was thought out with a small error and then mangled until Perl would accept it. The ^ and $ special characters in a regular expression do not mean 'start of line' (and 'end of line') in multiline context, but 'start of string' (and 'end of string'). The backslashes before them quote them as literal characters, which is not at all what you are looking for. If you are using the /m multiline mode, you can embed \n newline metacharacters into the string right away :
    /(?:^|\n)\.(\w+).+\n\t+\.\w+\W+(\w+)/m
    I didn't test this RE, but it looks more like what you described. Here's a breakdown of the RE :
    /(?:^|\n) # Start matching either at the start of the string # or at a newline. The ?: part means, that # the parentheses I use don't get saved in $1 \.(\w+).+\n # match a line starting with a dot-word and some # more stuff \t+\.\w+ # The line after this must start with at least one # tab and then a dot-word \W+(\w+) # and contain some other stuff as well ... /mx
    I hope that helps you a bit.

Re: Multi-line regex
by merlyn (Sage) on Aug 28, 2000 at 14:40 UTC
    Besides the above comment, I also didn't see anything even in your mangled code that reads the file. Must have that. And if you expect more than one line to be scanned by a regex, you have to read a bunch of lines:
    open FOO, "somewhere" or die "cannot open somewhere: $!"; $_ = join "", <FOO>; if (/ (?:^|\n) # Start matching either at the start of the string # or at a newline. The ?: part means, that # the parentheses I use don't get saved in $1 \.(\w+).+\n # match a line starting with a dot-word and some # more stuff \t+\.\w+ # The line after this must start with at least one # tab and then a dot-word \W+(\w+) # and contain some other stuff as well ... /mx ) { ... you found one ... }
    Something like that.

    -- Randal L. Schwartz, Perl hacker