in reply to Re: Regex: Ignore \n in \S
in thread Regex: Ignore \n in \S

Apologies. I meant to convey that $1 would "return" "\n" as well, instead of the non-spaces before it.
The regex I'd be using is DIP\s+\S+\s+(\S+).*?\\n

Replies are listed 'Best First'.
Re^3: Regex: Ignore \n in \S
by Corion (Patriarch) on Sep 12, 2010 at 20:50 UTC

    Ah. Now I understand. You have a literal "\n", not a newline in your data, but you don't want to capture that. The easiest way is to restrict the character class to not include the backslash:

    for (<DATA>) { print "Matched: $1\n" if /DIP\s+\S+\s+([^\\\s]+).*?\\n/; }; __DATA__ *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant \n RTDMA-63 4 + BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant hello world \n RTDMA-63 + 4 BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND

    The character class \S will not match any whitespace, but the character sequence "\n" (that is, backslash, followed by "n") is not whitespace.

      A better way would be to decode the lines, i.e. convert the "\n" into newlines.
      while (<DATA>) { s/\\n/\n/g; print "Matched: $1\n" if /DIP\s+\S+\s+(\S+).*?\n/; }
      Holy cow. Thank you. I've spent over 3 hours trying to tackel it! Thank you!
        Corion's analysis is right. As a question, you may not even need to fiddle with the \n at all, by changing from \S to \w. a word char is 0-9A-Za-z_ and therefore \ won't match. Its not clear to me what all kind of characters could be in "theoneiwant" anyway if your data allows it something even simpler can be done...
        for (<DATA>) { print "Matched: $1\n" if m/DIP\s+\S+\s+(\w+)/; #([\w-]+) to allow - };