Re^2: Regex: Ignore \n in \S

Replies are listed 'Best First'.
Re^3: Regex: Ignore \n in \S by Corion (Patriarch) on Sep 12, 2010 at 20:50 UTC
Ah. Now I understand. You have a literal `"\n"`, not a newline in your data, but you don't want to capture that. The easiest way is to restrict the character class to not include the backslash: for (<DATA>) { print "Matched: $1\n" if /DIP\s+\S+\s+([^\\\s]+).?\\n/; }; __DATA__ ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND * ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant \n RTDMA-63 4 + BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND * ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant hello world \n RTDMA-63 + 4 BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND * ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND [download] The character class `\S` will not match any whitespace, but the character sequence "\n" (that is, backslash, followed by "n") is not whitespace.	[reply] [d/l] [select]
Re^4: Regex: Ignore \n in \S by ikegami (Patriarch) on Sep 13, 2010 at 00:16 UTC
A better way would be to decode the lines, i.e. convert the "\n" into newlines. `while (<DATA>) { s/\\n/\n/g; print "Matched: $1\n" if /DIP\s+\S+\s+(\S+).*?\n/; }` [download]	[reply] [d/l]
Re^4: Regex: Ignore \n in \S by manav_gupta (Acolyte) on Sep 12, 2010 at 20:55 UTC
Holy cow. Thank you. I've spent over 3 hours trying to tackel it! Thank you!	[reply]
Re^5: Regex: Ignore \n in \S by Marshall (Canon) on Sep 12, 2010 at 21:25 UTC
Corion's analysis is right. As a question, you may not even need to fiddle with the \n at all, by changing from \S to \w. a word char is 0-9A-Za-z_ and therefore \ won't match. Its not clear to me what all kind of characters could be in "theoneiwant" anyway if your data allows it something even simpler can be done... `for (<DATA>) { print "Matched: $1\n" if m/DIP\s+\S+\s+(\w+)/; #([\w-]+) to allow - };` [download]	[reply] [d/l]