ericwsf has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to create a regular expression for what think should be pretty simple straight forward syntax but I am not having any luck

I want to support parsing an ini setting defined like:
RepeatingGroup = <name> [, Flatten][, in|out]]

I want "Flatten" and "in|out" to be able to be used in either order.

Examples:

RepeatingGroup = Waiver, Flatten, out
RepeatingGroup = Waiver, out
RepeatingGroup = Waiver, out, Flatten
RepeatingGroup = Waiver , in
RepeatingGroup = Waiver , Flatten

This is the expression I am trying to use

"RepeatingGroup\s*=\s*(?<GroupID>\b[^ \t]+\b)\s*(?=\s*,\s*(?<Flatten>\ +bFlatten\b))?(?=\s*,\s*(?<Direction>\b(in|out)\b))?$"

Many examples I find online suggest the use of the ?= lookahead.

What am I doing wrong?

2017-08-05 Athanasius added code tags

Replies are listed 'Best First'.
Re: Regex with lookahead
by karlgoethebier (Abbot) on Aug 04, 2017 at 19:12 UTC
    "... parsing an ini..."

    In a hurry: Config::IniFiles...? Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re: Regex with lookahead
by tybalt89 (Monsignor) on Aug 04, 2017 at 19:31 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1196747 use strict; use warnings; use Data::Dump 'pp'; while(<DATA>) { /RepeatingGroup\s*=\s* (?<GroupID>\b\S+\b)\s* (?=.*(?<Flatten>\bFlatten\b))? (?=.*(?<Direction>\b(in|out)\b))? /x or next; print; pp \%+; } __DATA__ RepeatingGroup = Waiver, Flatten, out RepeatingGroup = Waiver, out RepeatingGroup = Waiver, out, Flatten RepeatingGroup = Waiver , in RepeatingGroup = Waiver , Flatten
Re: Regex with lookahead
by Marshall (Canon) on Aug 04, 2017 at 22:53 UTC
    My general advice is that you are doing way more in one regex than you need to do. Perl is extremely good with regex both performance and feature wise. But not every feature needs to be used for every problem.

    In Perl, I would use a module like Config::Tiny to parse the .ini file. This would handle different [Sections] in the .ini file. However it appears that you only have the "root", section (no name) to deal with.

    I would consider a sequence of regular expressions instead of one complex regex. Something perhaps like below. In Perl sometimes it is actually faster execution wise to run a couple of regex'es on the same variable rather one complex one. In this case, as I understand it, your INI file which will only be parsed once at the start of your program (meaning that performance really doesn't matter that much for this task). I recommend forgetting this lookahead stuff, it is not needed here. Go with simple multiple statements.

    #!/usr/bin/perl use strict; use warnings; my @example = ( 'RepeatingGroup = Waiver, Flatten, out', 'RepeatingGroup = Waiver, out', 'RepeatingGroup = Waiver, out, Flatten', 'RepeatingGroup = Waiver , in', 'RepeatingGroup = Waiver , Flatten', ); foreach my $example (@example) { $example =~ s/^RepeatingGroup =\s*//; print "This is what .ini parser says: $example\n"; my ($name,$direction,$flatten) = parseRepeatingGroup($example); print " name = $name\n". " direction = $direction\n", " flatten = $flatten\n"; } sub parseRepeatingGroup { my $value_text = shift; my ($name,$rest) = $value_text =~ /^\s*(\S+)(.*)+/; my $direction = ($value_text =~ /\bin\b/i) ? 'in' : 'out'; my $flatten = ($value_text =~ /\bFLATTEN\b/i)? 1: 0; return ($name,$direction,$flatten); } __END__ This is what .ini parser says: Waiver, Flatten, out name = Waiver, direction = out flatten = 1 This is what .ini parser says: Waiver, out name = Waiver, direction = out flatten = 0 This is what .ini parser says: Waiver, out, Flatten name = Waiver, direction = out flatten = 1 This is what .ini parser says: Waiver , in name = Waiver direction = in flatten = 0 This is what .ini parser says: Waiver , Flatten name = Waiver direction = out flatten = 1
    Also in Perl, I recommend that you learn about the "//=" operator which can assign a default value to a variable which is undefined. $v //='default';

    In C, I would use a very different approach. I don't see the need for regex in the C code. There are other ways of doing this, but this is not a C or C++ forum.

Re: Regex with lookahead
by ericwsf (Novice) on Aug 04, 2017 at 20:31 UTC
    Thanks for the replies. I think I am making progress. For background I am actually working on a C++ project using a library that has a Perl-compliant Regex class.
    I thought I'd verify my expression was indeed Perl correct.
    The perl 5.10 install I have here does not include Data:Dump. We have Data::Dumper but I don't know if it's comparable. But anyway I can't fully run the solutions as written.
    But I have updated my C++ code to reflect the feedback.
    But it does not handle "RepeatingGroup = LEI, out, Flatten". I get no match at all. But I did not use
    (?=.*(?<Flatten>\bFlatten\b))?
    I used
    (?:\\s*,\\s*(?P<Flatten>\\bFlatten\\b))?
    I may not be understanding lookahead syntax but it seems like the ".*" would match anything
    I want to make sure spaces and commas only preced the match.

      Yep, Data::Dumper also works, it's just not as "cool".

      Here's an alternate that only allows Flatten, in, or out (but note that multiples are allowed)

      #!/usr/bin/perl # http://perlmonks.org/?node_id=1196747 use strict; use warnings; use Data::Dumper; while(<DATA>) { /RepeatingGroup\s*=\s* (?<GroupID>\b\w+\b) (\s*,\s* ((?<Flatten>\bFlatten\b)|(?<Direction>\b(in|out)\b)) )* \s*$/x or next; print $_, Dumper \%+; } __DATA__ RepeatingGroup = Waiver, Flatten, out RepeatingGroup = Waiver, out RepeatingGroup = Waiver, out, Flatten RepeatingGroup = Waiver , in RepeatingGroup = Waiver , Flatten RepeatingGroup = LEI, out, Flatten

        *cool* is in the eye of the beholder, worthy monk:

        $ corelist Data::Dumper Data for 2017-01-14 Data::Dumper was first released with perl 5.005 $ corelist Data::Dump Data for 2017-01-14 Data::Dump was not in CORE (or so I think)

        To some, "cool" is working without non-core modules, not necessarily what's newer or more fashionable. I'm not saying that Data::Dump is bad, I'm just pointing out that Data::Dumper is available with almost any Perl 5 install.

Re: Regex with lookahead
by Anonymous Monk on Aug 04, 2017 at 19:41 UTC
    use warnings; use strict; use Data::Dump; my $re = qr{ \A \s* RepeatingGroup \s* = \s* (?<GroupID> \b[^\ \t]+\b ) \s* ( , \s* (?: (?<Flatten> \bFlatten\b ) | (?<Direction> \b(?:in|out)\b ) ) \s* ){1,2} \z }msx; dd $_, /$re/ ? \%+ : "FAIL" while <DATA>; __DATA__ RepeatingGroup = Waiver, Flatten, out RepeatingGroup = Waiver, out RepeatingGroup = Waiver, out, Flatten RepeatingGroup = Waiver , in RepeatingGroup = Waiver , Flatten
Re: Regex with lookahead
by Anonymous Monk on Aug 04, 2017 at 19:49 UTC
Re: Regex with lookahead
by ericwsf (Novice) on Aug 04, 2017 at 20:46 UTC
    No, using the .* on "RepeatingGroup = LEI,out, Flatten" The GroupID match is "LEI,out"
      But putting comma in the exclusion class prevents getting "out" and Flatten is captured but "out" remains uncaptured
Re: Regex with lookahead
by tybalt89 (Monsignor) on Aug 04, 2017 at 19:35 UTC

    BTW, you HAVE TO, HAVE TO, HAVE TO, HAVE TO, HAVE TO, HAVE TO put your regex in code blocks or PM eats the character class.

Re: Regex with lookahead
by ericwsf (Novice) on Aug 04, 2017 at 20:34 UTC
    Wait i still have to try the alternation version
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.