in reply to Re^3: Regular expressions: Extracting certain text from a line
in thread Regular expressions: Extracting certain text from a line

Hi Ken!

Here's my latest try. It may be of interest to you. This is full-on 5.10+ as I wanted to get away from the  (??{ ... }) construct with its scary warnings and experiment some more with the  (?PARNO) construct, and also with  (DEFINE), which I still don't fully understand. As you see, the  (DEFINE) version requires an extra grep step; I couldn't figure out how to avoid it. Also, definition of empty squares or curlies expanded to include unlimited whitespace. Tested under Strawberries 5.10.1.5 and 5.14.4.1.

use 5.010; # ++ possessive, (?PARNO), (DEFINE) use strict; use warnings; use Test::More # tests => ?? + 1 # Test::NoWarnings adds 1 test 'no_plan' ; use Test::NoWarnings; my $empty_curly = qr{ { \s* } }xms; my $empty_square = qr{ \[ \s* \] }xms; my $not_empty = qr{ (?! $empty_curly | $empty_square) }xms; my $curly = qr{ $not_empty { (?: [^{}] ++ | $empty_curly | (?R) )+ + } }xms; my $square = qr{ $not_empty \[ (?: [^\[\]]++ | $empty_square | (?R) )+ + \] }xms; my $re1 = qr{ $curly | $square }xms; my $re2 = qr{ ( (?&SQUARE) | (?&CURLY) ) # works # (?<X>(?&SQUARE)) | (?<Y>(?&CURLY)) # works # (?&SQUARE) | (?&CURLY) # no # (?: (?&SQUARE) | (?&CURLY) ) # no (?(DEFINE) (?<EMPTY_SQUARE> \[ \s* \] ) (?<EMPTY_CURLY> { \s* } ) (?<NOT_EMPTY> (?! (?&EMPTY_SQUARE) | (?&EMPTY_CURLY))) (?<SQUARE> (?&NOT_EMPTY) \[ (?: [^\[\]]++ | (?&EMPTY_SQUARE) | + (?R) )+ \] ) (?<CURLY> (?&NOT_EMPTY) { (?: [^{}] ++ | (?&EMPTY_CURLY) | + (?R) )+ } ) ) }xms; VECTOR: for my $ar_vector ( [ '...?[](...$[] = [ USER_ENTITY_NAME ], text${} = { this is a tes +t })...', '[ USER_ENTITY_NAME ]', '{ this is a test }', ], [ 'a[] = a[ ] = a[ ] = [ this is a [ test ] { test2 } ]', '[ this is a [ test ] { test2 } ]', ], [ 'a{} = a{ } = a{ } = { this is a { test } [ test2 ] }', '{ this is a { test } [ test2 ] }', ], [ '{ a { b [ {}c{} ] d } e } = [ f [ g { []h[] } i ] j ]', '{ a { b [ {}c{} ] d } e }', '[ f [ g { []h[] } i ] j ]', ], [ '{}[]{ {}[] { } [ ] }[ ]{ } - [ ]{ }[ []{} [ ] { } ]{}[]', '{ {}[] { } [ ] }', '[ []{} [ ] { } ]', ], ) { my ($string, @expected) = @$ar_vector; is_deeply [ $string =~ m{ $re1 }xmsg ], \@expected, # qq{} ; is_deeply [ grep defined, $string =~ m{ $re2 }xmsg ], \@expected, # qq{} ; } # end for VECTOR

Replies are listed 'Best First'.
Re^5: Regular expressions: Extracting certain text from a line
by kcott (Archbishop) on Apr 09, 2014 at 03:08 UTC

    Thanks. This is of interest.

    I was aiming for a 5.8 solution: it was only after posting that I noticed ++ wasn't introduced until 5.10.0. Both the 5.8.8 and 5.18.2 doco show the same (??{ code }) example for matching (...), which I more or less copied for {...} and [...], so I wasn't too concerned about the experimental warnings for that bit.

    I noticed that SimonPratt had hinted at a (?PARNO) solution (in Re^3: Regular expressions: Extracting certain text from a line) and I did look into that yesterday; although, I didn't spend a huge amount of time on it. Like you, I'm not really across (DEFINE): I'll spend a bit more time looking at this in concert with your code.

    I ran the four tests under 5.18.1. The two you'd marked as # works passed all tests for me; the other two (# no) both failed tests 2, 4, 6, 8 and 10 with $got->[0] = Does not exist in every case.

    -- Ken

      ... the ... doco show the same  (??{ code }) example for matching  (...) ... a  (?PARNO) solution ...

      I'm interested in  (?PARNO) and  (?R) because the documentation examples for nested expressions are almost the same. (Indeed, the doc sez "Similar to "(??{ code })"..." re: (?PARNO).) The impression the docs give is that something like  (?R) is somehow 'safer'. This may be tied up with the fact that  (??{ code }) used to be broken for use with lexicals – a problem I believe was fixed somewhere around 5.16 or 5.18 (I only go as high as 5.14 right now).

      Thanks to the 5.18 feedback on  (DEFINE). Yes, some variations work, some don't, and I don't understand the reason(s) for the differences.