in reply to Re^2: Regex: Ignore \n in \S
in thread Regex: Ignore \n in \S

Ah. Now I understand. You have a literal "\n", not a newline in your data, but you don't want to capture that. The easiest way is to restrict the character class to not include the backslash:

for (<DATA>) { print "Matched: $1\n" if /DIP\s+\S+\s+([^\\\s]+).*?\\n/; }; __DATA__ *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant \n RTDMA-63 4 + BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant hello world \n RTDMA-63 + 4 BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND

The character class \S will not match any whitespace, but the character sequence "\n" (that is, backslash, followed by "n") is not whitespace.

Replies are listed 'Best First'.
Re^4: Regex: Ignore \n in \S
by ikegami (Patriarch) on Sep 13, 2010 at 00:16 UTC
    A better way would be to decode the lines, i.e. convert the "\n" into newlines.
    while (<DATA>) { s/\\n/\n/g; print "Matched: $1\n" if /DIP\s+\S+\s+(\S+).*?\n/; }
Re^4: Regex: Ignore \n in \S
by manav_gupta (Acolyte) on Sep 12, 2010 at 20:55 UTC
    Holy cow. Thank you. I've spent over 3 hours trying to tackel it! Thank you!
      Corion's analysis is right. As a question, you may not even need to fiddle with the \n at all, by changing from \S to \w. a word char is 0-9A-Za-z_ and therefore \ won't match. Its not clear to me what all kind of characters could be in "theoneiwant" anyway if your data allows it something even simpler can be done...
      for (<DATA>) { print "Matched: $1\n" if m/DIP\s+\S+\s+(\w+)/; #([\w-]+) to allow - };