Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I resisted asking a simple regex question. I understand newlines, but my problem comes with matching text after my newline. I can successfully match the newline.

My text is:
CONNAME(163.231.99.129) CURRENT CHLTYPE(SVRCONN) STATUS(RUNNING) LSTMSGTI(03.45.47) LSTMSGDA(2003-11-21)

My regex is:
if (/CONNAME\((.*\..*\..*\..*)\)[\s.]*CURRENT[\s.]*CHL/is){ print "$&\n"; }


I can match the newline with: (/CONNAME\((.*\..*\..*\..*)\)[\s.]*CURRENT[\s.]*

But, when I add even one letter, it fails. There is white space before CHLTYPE, but I accounted for that.

If I take off the [\s.]* at the end, I don't match the newline, which confirms that I'm matching it with it on (it also prints out a newline, which also confirms the match. It's that dang character after the newline.

Please help. I've looked through the forum and most deals with matching the newline. I got that far, and I know unix is \n, which \s should, and does, match.

Thanks,
BrassMonkey
"that funky Monkey"

Edited by Chady -- added code tags.

Replies are listed 'Best First'.
Re: newline in unix
by ercparker (Hermit) on Jul 15, 2004 at 04:25 UTC
    you can try this:
    { local $/ = undef; open(FILE, "<file.txt") or die "open failed on file.txt: $!"; $wholetext = <FILE>; close(FILE); } if ( $wholetext =~ /CONNAME\((\d+?\.\d+?\.\d+?\.\d+?)\)\s+CURRENT/ ) { print $1; } __DATA__ CONNAME(163.231.99.129) CURRENT CHLTYPE(SVRCONN) STATUS(RUNNING) LSTMSGTI(03.45.47) LSTMSGDA(2003-11-21) __OUTPUT__ 163.231.99.129
Re: newline in unix
by graff (Chancellor) on Jul 15, 2004 at 03:46 UTC
    You need to have <code> tags around your code (and around your sample data) in order for your post to make sense.

    "\s" will match a newline (and carriage-return if you have any); when "." is inside square brackets (as part of a character class), it does not work as a wildcard, and will only match a literal period.

    You could simplify your regex a lot (and make it easier to understand and less likely to break):

    if ( /CONNAME.([\d.]+).*?CURRENT\s+CHL/ ) { print $&, $/; }
    (update: removed spurious open paren from start of regex)

    The  [\d.]+ bit will match a string consisting of digits and/or periods (one or more) -- i.e. any of ".", "..", "...", "1", "12", "123", "1.2.3", etc. Meanwhile, the periods outside the square brackets will match anything, like parens, whitespace, etc.

Re: newline in unix
by davidj (Priest) on Jul 15, 2004 at 03:48 UTC
    But, when I add even one letter, it fails.

    Maybe I don't understand what you are saying, so I must ask: When you add even one letter where?

      This is my data
      
          CONNAME(163.231.99.129)                 CURRENT
          CHLTYPE(SVRCONN)                        STATUS(RUNNING)
          LSTMSGTI(03.45.47)                      LSTMSGDA(2003-11-21)
      

      Here is the regex (thanks for helping clean it up)
      
      if (/CONNAME\((\d.+)\)\s.*CURRENT\s.*/is){
      print $&;} 
      

      It's still not showing the brackets around my character classes in my post, sorry.

      Where it fails is when trying to match this from CONNAME through CHLTYPE.
      I can match CONNAME through to CURRENT with the newline character.
      But if I try and match any characters on the next line, the program returns no output.


        How are you getting your data into your code? I ask because you might be doing this:

        while (<>) { .... }

        ...which gets a line at a time, explaining why adding one more character after your regex fails. I put your data into a file named /tmp/corpus.txt and did this, which worked:

        cat /tmp/corpus.txt | perl -le '$_ = join("", <>); \ print $& if /CONNAME\((\d{1,3}(\.\d{1,3}){3})\)\s*CURRENT\s*CHL/;' CONNAME(163.231.99.129) CURRENT CHL

        The different regex (\d{1,3}(\.\d{1,3}){3}) is slightly better at validating an IP address. Probably not essential unless you think your data might get munged; it could still match bogus things like "999.99.9.999", but it'll filter out bits like "1.2.3.4.5" or "2555.254.0.3".

        Note that loading $_ like that is often frowned upon; you might consider:

        my $corpus = join("", <>); print $& if $corpus =~ /CONNAME\((\d{1,3}(\.\d{1,3}){3})\)\s*CURRENT\s*CHL/;

        Hope that helps!

        --j

        First of all, look up Writeup Formatting Tips -- it explains how to post code coherently, which is like this:

        <code>

        # perl code here, with literal brackets intact: [blah]
        </code>

        As for the regex problem itself, now that I see what the data "really" looks like (though it's hard to be sure how many whitespace characters there really are), maybe something like this would work better:

        if ( /\w+.([\d.]+).\s+\w+\s+\w+/ ) { print $&, $/; }
        Or, if you really want to be specific about the characters you want to match:
        if ( /\w+.([\d.]+).\s+CURRENT\s+CHLTYPE/ ) { print $&, $/; }
        I did try those out on your data, and the print-out includes the linefeed where it belongs.

        Now, I presume that your real goal is something other than that odd looking output from print, and depending on what your real goal is, maybe a regex isn't your best choice -- e.g. how about using split()?

        update: having seen ercparker's reply below, I should point out that I was assuming all along that you already had all three lines of text stored together in $_ -- but if you've actually been reading and matching one line at a time (as most people usually do), then ercparker is right: you can't match across a newline if $_ does not contain anything after the first newline.

        Here we go...

        
        if (/CONNAME\(([\d.]+)\)[\s.]*CURRENT[\s.]*/is){
        print $&;} 
        
        


Re: newline in unix
by Anonymous Monk on Jul 15, 2004 at 04:42 UTC
    Thanks ercparker!
    That worked
    I think my problem might have been that I was reading my input from unix standardin. When I put the data in a file, it works fine
    Again, many thanks!
    I can research later why my problem was happening, but for now I can at least parse the data I'm working with by piping it to a file first and opening it in perl.
    The next time I post I'll be sure to format better
Re: newline in unix
by Anonymous Monk on Jul 15, 2004 at 04:58 UTC
    Thanks to Purp too!
    And all others for their help as well.
Re: newline in unix
by Anonymous Monk on Jul 15, 2004 at 03:36 UTC
    both \s. that are underlined in the post are character classes with a * after them.
    I can't get it to show up properly in the post.

    BrassMonkey