Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Okay, be kind to a newbie here...

I'm wanting to build an alternation at run-time to filter the contents of a log file. I'm not setting the appropriate switch to the regular expression. The code is below:

#!/usr/bin/perl use strict; # create alternation my @list; while (<DATA>) { chomp; push @list, $_; } my $regex = '[' . join('|', @list) . ']'; my $s = 'huey'; print "match:\t$s\n" if $s =~ m/\Q$regex/; __DATA__ huey dewey louie

Thoughts? Can you help educate the less fortunate? Thanks.

Replies are listed 'Best First'.
Re: creating a regular expression's alternation dynamically?
by Zaxo (Archbishop) on Aug 06, 2004 at 14:58 UTC

    You may want to look into the regex quotation operator, qr. The $" variable can be used to force alternation into a quoted array.

    #!/usr/bin/perl use warnings; use strict; chomp(my @list = <DATA>); my $regex = do { local $" = '|'; qr/@list/; }; print $regex, $/; my $s = 'huey'; print "match:\t$s\n" if $s =~ $regex; __DATA__ huey dewey louie
    I've demonstrated some tricks which reduce the visual noise. chomp can be applied to a list, the diamond op returns all lines in array context, and the elements of a quoted array are seperated by $".

    After Compline,
    Zaxo

Re: creating a regular expression's alternation dynamically?
by mifflin (Curate) on Aug 06, 2004 at 14:26 UTC
    #!/usr/bin/perl use strict; # create alternation my @list; while (<DATA>) { chomp; push @list, $_; } my $regex = '^(' . join('|', @list) . ')$'; my $s = 'huey'; print "regex = $regex\n"; print "match:\t$1\n" if $s =~ m/$regex/; __DATA__ huey dewey louie
    Running the program...
    C:\Temp\test>perl test.pl regex = ^(huey|dewey|louie)$ match: huey

      In the join, you might want to map quotemeta over @list so that any meta-chars (particulary |) get escaped.

      Also you have to be careful that none of the strings are prefixes of another string, otherwise you'll find that ordering is significant in an alternation.

      --
      integral, resident of freenode's #perl
      
        yep, good point, thanks!
        use strict; use warnings; # create alternation my @list; while (<DATA>) { chomp; push @list, $_; } my $regex = '^(' . join('|', map {quotemeta} @list) . ')$'; my $s = 'huey'; print "regex = $regex\n"; print "match:\t$1\n" if $s =~ m/$regex/; __DATA__ huey de|wey louie
        output is...
        # perl test.pl regex = ^(huey|de\|wey|louie)$ match: huey
Re: creating a regular expression's alternation dynamically?
by davido (Cardinal) on Aug 06, 2004 at 15:14 UTC

    The first problem I spotted is that you're using [ and ] as you build the regexp, and putting alternation inside the square brackets. That doesn't work. In fact, all it really means is to accept any character in @list, plus accept the | character. Alternation is meaningless inside character classes.

    For that section, you really want, "my $regex = join '|', @list;" (in other words, forget about the character class, unless of course you really do want a character class, in which case you'll forget about alternation).

    You might also want to use the qr// operator. If you do it that way, here's a snippet for you to work with:

    use strict; my( @list ) = <DATA>; chomp @list; my $buildup = join '|', @list; my $regexp = qr/$buildup/; my $s = huey'; print "match:\t$s\n" if $s =~ $regexp; __DATA__ huey dewey louie

    Dave

Re: creating a regular expression's alternation dynamically?
by ysth (Canon) on Aug 06, 2004 at 18:21 UTC
    Your while loop isn't needed. Just say chomp(my @list = <DATA>);

    You don't want to apply quotemeta ("\Q...) to the whole regex, since that will make the | characters match literally. You also need to beware of having things earlier in the list that match a prefix of something later in the list, if what part of the string your regex matches matters to you. For instance:

    print "foobarbaz" =~ m/(foo|foobar|baz)/g; print "foobarbaz" =~ m/(foobar|foo|baz)/g;
    print different things. To ensure that a list element is never matched where it is a prefix of another element that would match, sort the list by descending length.

    See Re: Common Perl Idioms for an implementation.