in reply to Re: newline in unix
in thread newline in unix

This is my data

    CONNAME(163.231.99.129)                 CURRENT
    CHLTYPE(SVRCONN)                        STATUS(RUNNING)
    LSTMSGTI(03.45.47)                      LSTMSGDA(2003-11-21)

Here is the regex (thanks for helping clean it up)

if (/CONNAME\((\d.+)\)\s.*CURRENT\s.*/is){
print $&;} 

It's still not showing the brackets around my character classes in my post, sorry.

Where it fails is when trying to match this from CONNAME through CHLTYPE.
I can match CONNAME through to CURRENT with the newline character.
But if I try and match any characters on the next line, the program returns no output.


Replies are listed 'Best First'.
Re^3: newline in unix
by purp (Novice) on Jul 15, 2004 at 04:37 UTC

    How are you getting your data into your code? I ask because you might be doing this:

    while (<>) { .... }

    ...which gets a line at a time, explaining why adding one more character after your regex fails. I put your data into a file named /tmp/corpus.txt and did this, which worked:

    cat /tmp/corpus.txt | perl -le '$_ = join("", <>); \ print $& if /CONNAME\((\d{1,3}(\.\d{1,3}){3})\)\s*CURRENT\s*CHL/;' CONNAME(163.231.99.129) CURRENT CHL

    The different regex (\d{1,3}(\.\d{1,3}){3}) is slightly better at validating an IP address. Probably not essential unless you think your data might get munged; it could still match bogus things like "999.99.9.999", but it'll filter out bits like "1.2.3.4.5" or "2555.254.0.3".

    Note that loading $_ like that is often frowned upon; you might consider:

    my $corpus = join("", <>); print $& if $corpus =~ /CONNAME\((\d{1,3}(\.\d{1,3}){3})\)\s*CURRENT\s*CHL/;

    Hope that helps!

    --j

Re^3: newline in unix
by graff (Chancellor) on Jul 15, 2004 at 04:25 UTC
    First of all, look up Writeup Formatting Tips -- it explains how to post code coherently, which is like this:

    <code>

    # perl code here, with literal brackets intact: [blah]
    </code>

    As for the regex problem itself, now that I see what the data "really" looks like (though it's hard to be sure how many whitespace characters there really are), maybe something like this would work better:

    if ( /\w+.([\d.]+).\s+\w+\s+\w+/ ) { print $&, $/; }
    Or, if you really want to be specific about the characters you want to match:
    if ( /\w+.([\d.]+).\s+CURRENT\s+CHLTYPE/ ) { print $&, $/; }
    I did try those out on your data, and the print-out includes the linefeed where it belongs.

    Now, I presume that your real goal is something other than that odd looking output from print, and depending on what your real goal is, maybe a regex isn't your best choice -- e.g. how about using split()?

    update: having seen ercparker's reply below, I should point out that I was assuming all along that you already had all three lines of text stored together in $_ -- but if you've actually been reading and matching one line at a time (as most people usually do), then ercparker is right: you can't match across a newline if $_ does not contain anything after the first newline.

Re^3: newline in unix
by Anonymous Monk on Jul 15, 2004 at 04:04 UTC
    Here we go...

    
    if (/CONNAME\(([\d.]+)\)[\s.]*CURRENT[\s.]*/is){
    print $&;} 
    
    


      Sorry, I'm a forum retard.

      This works:
      
      
       if (/CONNAME\(([\d.]+)\)[\s.]*CURRENT[\s.]*/is) {
       print $&;} 
       


      This does not:
      
       if (/CONNAME\((.*\..*\..*\..*)\)\[\s.\]*CURRENT\[\s.\]*CHL/is) {
       print $&;}
      


      My last post wasn't a solution to my problem, but was the actual regex with brackets included.