manav_gupta has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks I've been banging my head against this for a while, but no luck. Please could you help.

In the following strings, I'm trying to extract "theoneiwant", without the "\n".

I've tried DIP\s+\S+\s+(\S+).*?\\n but, $1 returns "\n" as well, if there's nothing else between "theoneiwant" and "\n" theoneiwant is a sample, non-space pattern.
*** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant \n RTDMA-63 4 + BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant hello world \n RTDMA-63 + 4 BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND

Replies are listed 'Best First'.
Re: Regex: Ignore \n in \S
by Corion (Patriarch) on Sep 12, 2010 at 20:39 UTC

    You don't show the code you use, so it's hard to say how your regular expression "returns" something.

    Maybe you want to use ([^\n]+) instead of the (\S+)? But \n would be in the \s class, so your \S shouldn't capture a \n. Please show the actual code you use.

      Apologies. I meant to convey that $1 would "return" "\n" as well, instead of the non-spaces before it.
      The regex I'd be using is DIP\s+\S+\s+(\S+).*?\\n

        Ah. Now I understand. You have a literal "\n", not a newline in your data, but you don't want to capture that. The easiest way is to restrict the character class to not include the backslash:

        for (<DATA>) { print "Matched: $1\n" if /DIP\s+\S+\s+([^\\\s]+).*?\\n/; }; __DATA__ *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant \n RTDMA-63 4 + BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant hello world \n RTDMA-63 + 4 BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP + MG\n2-2-010109 MEAPH\nEND *** ALARM 009 A2/APT \"KEN5-132/019/00\" 100831 1511 \nSWITCHIN +G NETWORK TERMINAL FAULT\n\nSNT TCASE STATE FCODE +SUBSNT INFO DIP amad theoneiwant\n RTDMA-63 4 +BLOC 38\n\nEXTERNAL EQUIPMENT FAILURE\n\nEXTP MG\n +2-2-010109 MEAPH\nEND

        The character class \S will not match any whitespace, but the character sequence "\n" (that is, backslash, followed by "n") is not whitespace.

Re: Regex: Ignore \n in \S
by TomDLux (Vicar) on Sep 14, 2010 at 02:41 UTC

    While of course the correct solution is to use the correct regexp, any time you have a string which may or may not have an undesired newline at the end, chomp can fix the problem

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.