in reply to Getting the last line

For the simplest of reasons, it doesn't match :)

With re 'debug'

use re 'debug'; my $sequence_to_parse =">test\nATG\nGGG"; while ($sequence_to_parse=~/^>.*\n(^(?!>).*$)+/gm) {print "$&\n";} __END__ Compiling REx "^>.*\n(^(?!>).*$)+" Final program: 1: MBOL (2) 2: EXACT <>> (4) 4: STAR (6) 5: REG_ANY (0) 6: EXACT <\n> (8) 8: CURLYX[0] {1,32767} (25) 10: OPEN1 (12) 12: MBOL (13) 13: UNLESSM[0] (19) 15: EXACT <>> (17) 17: SUCCEED (0) 18: TAIL (19) 19: STAR (21) 20: REG_ANY (0) 21: MEOL (22) 22: CLOSE1 (24) 24: WHILEM[1/1] (0) 25: NOTHING (26) 26: END (0) anchored ">" at 0 floating "%n" at 1..2147483647 (checking floating) a +nchored(MBOL) minlen 2 Guessing start of match in sv for REx "^>.*\n(^(?!>).*$)+" against ">t +est%nATG%nGGG" Found floating substr "%n" at offset 5... Found anchored substr ">" at offset 0... Position at offset 0 does not contradict /^/m... Guessed: match at offset 0 Matching REx "^>.*\n(^(?!>).*$)+" against ">test%nATG%nGGG" 0 <> <>test%nATG> | 1:MBOL(2) 0 <> <>test%nATG> | 2:EXACT <>>(4) 1 <>> <test%nATG> | 4:STAR(6) REG_ANY can match 4 times out of 214 +7483647... 5 <>test> <%nATG%nGGG> | 6: EXACT <\n>(8) 6 <test%n> <ATG%nGGG> | 8: CURLYX[0] {1,32767}(25) 6 <test%n> <ATG%nGGG> | 24: WHILEM[1/1](0) whilem: matched 0 out of 1..3276 +7 6 <test%n> <ATG%nGGG> | 10: OPEN1(12) 6 <test%n> <ATG%nGGG> | 12: MBOL(13) 6 <test%n> <ATG%nGGG> | 13: UNLESSM[0](19) 6 <test%n> <ATG%nGGG> | 15: EXACT <>>(17) failed... 6 <test%n> <ATG%nGGG> | 19: STAR(21) REG_ANY can match 3 times out +of 2147483647... 9 <test%nATG> <%nGGG> | 21: MEOL(22) 9 <test%nATG> <%nGGG> | 22: CLOSE1(24) 9 <test%nATG> <%nGGG> | 24: WHILEM[1/1](0) whilem: matched 1 out of 1.. +32767 9 <test%nATG> <%nGGG> | 10: OPEN1(12) 9 <test%nATG> <%nGGG> | 12: MBOL(13) failed... whilem: failed, trying conti +nuation... 9 <test%nATG> <%nGGG> | 25: NOTHING(26) 9 <test%nATG> <%nGGG> | 26: END(0) Match successful! >test ATG Guessing start of match in sv for REx "^>.*\n(^(?!>).*$)+" against "%n +GGG" Did not find floating substr "%n"... Match rejected by optimizer Freeing REx: "^>.*\n(^(?!>).*$)+"

YAPE::Regex::Explaination

use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/^>.*\n(^(?!>).*$)+/m )->explain; __END__ The regular expression: (?m-isx:^>.*\n(^(?!>).*$)+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- ( group and capture to \1 (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- > '>' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- )+ end of \1 (NOTE: because you are using a quantifier on this capture, only the LAST repetition of the captured pattern will be stored in \1) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

One problem I think I see is trying to use look-ahead as look-behind (?!pattern)

I would try something simple, like

my $sequence_to_parse =">test\nATG\nGGG"; while ( $sequence_to_parse =~ m/(^>.+)|(^.+)/gm ) { if ( defined $1 ) { print "got first line \$1 ($1)\n"; } elsif ( defined $2 ) { print "got other line \$2 ($2)\n"; } else { print "UH OH \n"; } } __END__ got first line $1 (>test) got other line $2 (ATG) got other line $2 (GGG)

Replies are listed 'Best First'.
Re^2: Getting the last line
by Anonymous Monk on Oct 06, 2011 at 03:32 UTC

    To get just the last line, since lines are chars not \r\n,

    my $sequence_to_parse =">test\nATG\nGGG"; print $sequence_to_parse =~ /([^\r\n]+)$/s; __END__ GGG

    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/([^\r\n]+)$/s )->explain; __END__ The regular expression: (?s-imx:([^\r\n]+)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?s-imx: group, but do not capture (with . matching \n) (case-sensitive) (with ^ and $ matching normally) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^\r\n]+ any character except: '\r' (carriage return), '\n' (newline) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------