rahulme81 has asked for the wisdom of the Perl Monks concerning the following question:

Happen to come across Regex lookahead, lookbehind and atomic groups in perl, though not very expert to use in first try.

Got a thought - If I can match a string name within a single regular expression with this concept

So I can avoid using multiple match patterns to deal with something required on this matches within my program

My search patterns looks like as below :

foo_bar_foo10.1.1.1.TEST.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST_test.txt

"foo"

1. followed by "underscore"

2. followed by "bar|test" (bar or test)

3. followed by "underscore"

4. followed by alhpnum string with dot at end (like foo10.1.1.1)

5. followed by one or zero occurences of same alhpnum string (like foo10.1.1.1)

6. followed by dot

7. followed by 1-10 characters, containing at least one digit and one letter and underscore (e.g. TEST_test, TEST_test2, TEST1_test2)

8. ends with .txt

Thanks in advance

Replies are listed 'Best First'.
Re: Regex lookahead, lookbehind
by Discipulus (Canon) on Feb 23, 2017 at 08:04 UTC
    Hello rahulme81,

    beside the wisdom received from brother choroba you can find interesting the read of Using Look-ahead and Look-behind.

    Also the precious resource of wisdom Modern Perl (the free book!) has a chapter dedicated to regexes: look for the Assertions paragraph in chapter 6

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Thanks Discipulus for the reference. Will go through that
Re: Regex lookahead, lookbehind
by choroba (Cardinal) on Feb 23, 2017 at 06:19 UTC
    Look around assertions are needed when you need overlapping matches, they are zero-width. Your case seems matchable without them, too:
    #!/usr/bin/perl use warnings; use strict; while (<DATA>) { print "$. ok\n" if /^foo # "foo" _ # followed by "underscore" (?:bar|test) # followed by "bar|test" (bar or tes +t) _ # followed by "underscore" ((?:\w+\.)+) # followed by alhpnum string with # dot at end (like foo10.1.1.1) \1? # followed by one or zero occurences # of same alhpnum string (like # foo10.1.1.1) followed by dot \w{1,10} # followed by 1-10 characters, # containing at least one digit and # one letter and underscore (e.g. # TEST_test, TEST_test2, # TEST1_test2) \.txt$ # ends with .txt /x; } __DATA__ foo_bar_foo10.1.1.1.TEST.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST_test.txt

    To test the condition "at least one digit and one letter and underscore", I'd probably capture the group and test these conditions independently:

    /\d/ && /[[:alpha:]]/ && /_/

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      Thank you very much choroba. This works !!!

      One last thing I need to extend to below match also

      foo_bar_foo10.1.1.1.TEST.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST_test.txt foo_test_foo10.1.1.1.foo10.1.1.1.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST-1.txt

      Third string is not a correct match but this is displayed as "Ok" whereas fourth doesn't seems ok but I want to match this also

      for "foo_test_foo10.1.1.1.foo10.1.1.1.TEST-1.txt" I modified the expression as (\w\-{1,10}) and that works ... Is this correct way ?

      trying to achieve the third format within the same format - as Nok

      Output for now 1 ok 2 ok 3 ok 4 ok

      Does this approach allows to use variable inside the regex format ?

      Can I declare like my $var = foo10.1.1.1 and use this in regex format ?

      Yes it does ... found answer for $var declaration I replaced  ((?:\w+\.)+) as ((?:($var)+\.)+) ==> Is this correct too

      Last thing I am not able to get

      3 is not correct .... I want to filter this out

      OR

      foo_test_foo10.1.1.1.foo10.1.1.1.txt - How to match only this in a separate regex

        I don't understand your requirements. It now seems to me none of the examples is correct, as TEST and TEST_test don't contain at least one digit, and TEST-1 doesn't contain at least one underscore. Changing the dash to underscore in the last makes it pass:
        #!/usr/bin/perl use warnings; use strict; while (<DATA>) { if (my ($alnum, $constrained) = /^foo # "foo" _ # followed by "underscore" (?:bar|test) # followed by "bar|test" (bar or test) _ # followed by "underscore" ((?:\w+\.)+?) # followed by alhpnum string with # dot at end (like foo10.1.1.1) \1? # followed by one or zero occurences # of same alhpnum string (like # foo10.1.1.1) followed by dot (\w{1,10}) # followed by 1-10 characters, # containing at least one digit and # one letter and underscore (e.g. # TEST_test, TEST_test2, # TEST1_test2) \.txt$ # ends with .txt /x) { print "<$alnum | $constrained> $. ok\n" if 3 == grep $constrained =~ $_, qr/_/, qr/\d/, qr/[[:alpha:]]/; } } __DATA__ foo_bar_foo10.1.1.1.TEST.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST_test.txt foo_test_foo10.1.1.1.foo10.1.1.1.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST-1.txt foo_test_foo10.1.1.1.foo10.1.1.1.TEST_1.txt

        Note that I changed the quantifier in the first capturing group to frugal to avoid matching both occurrences of $alnum as one without repetition. Nevertheless, in the third case, the whole substring foo10.1.1.1.foo10.1.1. corresponds to $alnum, it's not repeated, and $constrained is just 1 .

        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
        One last thing I need to extend to below match also

        When asking for help on regexes, please provide all the possible test cases. Also, you can use a module like Test::More - first, write as many test cases as possible, then work on your regex until all the tests pass, or post the code here if you have trouble.

        use warnings; use strict; use Test::More; my $regex = qr/foo/; like "foobar", $regex; unlike "quzbaz", $regex; # more test cases here done_testing;