http://qs1969.pair.com?node_id=674647

$dancarlson has asked for the wisdom of the Perl Monks concerning the following question:

Oh wise and learned Perl Monks,

I am a novice perl user. I do more reading Perl than writing it and I'm having a fair bit of trouble creating a multi line search. I have a big log file filled with colourful and exciting SIP Logging. I want to find the CallIDs from each SIP Call. The three lines I want to match looks like this:

From: "Bungalo Bill" <sip:5555555555@11.11.11.111:22>;tag=SD223sd2-31233dss^M
To: <sip:6666666666@11.11.11.111:22>^M
Call-ID: SD0e1af02-4d8d3eesdfsdfsd44w5f6fdb77814d-h6030fd^M

This file is filled with all kinds of stuff. The Data I have is 'Bungalo Bill' ($NAME) and '6666666666' ($INBOUND_NUMBER) and what I want to find is that long beautiful CallID (SD0e1af02-4d8d3eesdfsdfsd44w5f6fdb77814d-h6030fd).

Now, I thought that I could do something like the following, but that is not working for me (at all). I'm not sure if this is because of the Unix CRs(^M) or just my incredibly poor perl skills.

$_=~/From.*$NAME.$INBOUND_NUMBER.*Call-ID:\w(.*)/s

Any suggestions would be greatly appreciated.
  • Comment on Soliciting Multiline SIP Searching Suggestions

Replies are listed 'Best First'.
Re: Soliciting Multiline SIP Searching Suggestions (state)
by tye (Sage) on Mar 17, 2008 at 21:26 UTC
    my $found= 0; while( <> ) { if( 0 == $found && /^From: "\Q$Name\E"/ ) { $found++; } elsif( 1 == $found && /^To: <sip:$Number@/ ) { $found++; } elsif( 2 == $found && /^Call-ID: (.*)/ ) { return $1; } elsif( /^From: / || /^$/ ) { $found= 0; } }

    - tye        

Re: Soliciting Multiline SIP Searching Suggestions
by kyle (Abbot) on Mar 17, 2008 at 21:23 UTC

    I got your expression to work on your test data with a couple of small changes, but if you have a huge file full of that stuff, you probably want something a little different. In particular, you want to keep .* from matching more than one record at once. Here's what I came up with:

    use strict; use warnings; $_ = <<'END_SIP_LOG'; From: "Bungalo Bill" <sip:5555555555@11.11.11.111:22>;tag=SD223sd2-312 +33dss^M To: <sip:6666666666@11.11.11.111:22>^M Call-ID: SD0e1af02-4d8d3eesdfsdfsd44w5f6fdb77814d-h6030fd^M END_SIP_LOG ; s/\^M/\r/g; my $NAME = 'Bungalo Bill'; my $INBOUND_NUMBER = '6666666666'; #my ($callid) = /From.*$NAME.*$INBOUND_NUMBER.*Call-ID:\s(.*)/s; my ($callid) = m{ From: \s "\Q$NAME\E" .*? \r \n? To: \s <sip: \Q$INBOUND_NUMBER\E \@ .*? \r \n? Call-ID: \s (\S+) .*? \r \n? }xms; print "callid = '$callid'\n"; __END__ callid = 'SD0e1af02-4d8d3eesdfsdfsd44w5f6fdb77814d-h6030fd'

    I'm not sure if you have straight \r as line endings or the more likely \r\n, so the pattern matches either one.

    Updated to format the pattern prettier.

      /.*?/s isn't enough to prevent that part of the regex from gobbling up multiple lines and also backtracking pathologically. For example, your regex will incorrectly match (after you fix \r \n? to \r? \n or just \n):

      From: "Bungalo Bill" ... To: (wrong number) Call-ID: (wrong call) ... From: "Wrong Person" ... To: <sip:66666­66666@...> Call-ID: ThisIsTheWrongCallID

      - tye        

        Good catch! If lines do end in \n, the fix is easy. Just remove the /s option. Then /./ won't match the newline. The .* parts match to the end of the line (as they're supposed to) and won't go any further. If lines are terminated in \r, I'd have to change .* to [^\r]* instead.

Re: Soliciting Multiline SIP Searching Suggestions
by FunkyMonk (Chancellor) on Mar 17, 2008 at 21:17 UTC
    A quick search of CPAN found Net::SIP. Does that help?

    Update: Net::SIP won't help you -- it doesn't parse logs. Thanks for the heads-up Corion.

Re: Soliciting Multiline SIP Searching Suggestions
by grizzley (Chaplain) on Mar 18, 2008 at 12:04 UTC
    /^From:\s*"\Q$NAME\E".*$ \s* ^To:\s*<sip:\Q$INBOUND_NUMBER\E@.*$ \s* ^Call-ID:\s*(.*)/mgx

    And the idea behind:

    • Use ^, $ and //m as line delimiter - we search for three lines: /^something$ \s* ^something2$ \s* ^something3$/m. That assures the regexp will match nearest three lines and not first, second and 300th line. I don't understand why after $ and before ^ regexp must have \s* to catch any newline characters (notice that you don't have to worry which system you are working on, which line delimiters you have, just \s*). It seems $ matches the place right before newline character.
    • Now that three parts of regexp are limited to single line (I like that limitation and try to achieve it every time to avoid //s), let's limit variables: I suggest to limit $NAME by "" and $INBOUND_NUMBER by 'sip:' and '@'. That will assure regexp won't match on $NAME="Kevin Black" when there is 'From: "Kevin Blackwood"'
Re: Soliciting Multiline SIP Searching Suggestions
by $dancarlson (Initiate) on Mar 18, 2008 at 17:23 UTC
    Thanks Perl Monks!

    I decided to go with tye's State-Based model and it's working great. Thanks for all the suggestions!