MiggyMan has asked for the wisdom of the Perl Monks concerning the following question:

Im looking for a way to parse regex using variables so i can dump the regex in question (substitutions) to a config file so it can be user editable, now i've tried simply using varaibles and found that although it does work i have to process the regex data before hard or force the user to write ridiculusly convoluted regex strings (as opposed to just convoluted :D), anyway in short what id like to know is if theres a better way of doing things, perhaps something i've missed or a module just for the task ?

Replies are listed 'Best First'.
Re: User configurable regex
by Joost (Canon) on Oct 20, 2004 at 11:18 UTC
      Well for a start im looking at substitutions not matches.

      Now if I pass the substitution from a variable I have to go and escape a whole bunch of thing OR make the user do it when they set the regex (which doesnt make me happy at all).

      for example

      ---------------------- #!/usr/bin/perl $user=".admin.ecc"; $uregex="~s/^\\.(\.*)?\\.\.*/\$1/"; #$user =~ s/^\.(.*)?\..*/$1/; $regstring="\$user=$uregex"; print "$regstring\n"; eval($regstring); print "User: $user\n"; ----------------------


      The regex to process my username is s/^\.(.*)?\..*/$1/ but in order to get it working with eval it needs to be changed to this s/^\\.(\.*)?\\.\.*/\$1/ which is not the most usefull thing in the world so what im looking for is some means to parse a regex string, if it comes down to it i'll use a whole bunch of regex for it but id rather not if i can help it.
        Why would your users have to type the double quotes?
        #!/usr/bin/perl -w use strict; my $substitution = '$line =~ '.<DATA>; my $line = "abcdef"; eval $substitution; print $line; __DATA__ s/abc(\w+)f/xxx$1x/;
        output:
        xxxdex
        Update: what I mean is; the problem you're describing only occurs in string literals (and it could already be a lot less annoying if you used single quotes). this interpolation does not occur in strings per se.
Re: User configurable regex
by Random_Walk (Prior) on Oct 20, 2004 at 11:30 UTC

    Your question is not very clear to me but I will take a guess. You want users to be able to edit a config file to make simple substitutions in some stream of data.

    #this is what the config file will look like... #string to replace, replacement fred, john paul, pete smelly, feet #!/usr/bin/perl -w # here is some code (untested) use strict; my $config="substit.cfg"; my $regex; # setup our regexen open CONF, $config or die "cant open config $!\n"; while (<CONF>) { next if /^#/; # ignore comments next if /^\s*$/; # and empties chomp; my ($replace, $with)=split /,\s*/; $regex.="s/".$replace."/".$with."/;"; } close CONF; # apply them to the input stream while (<>) { eval $regex; print; }

    This is of cource fraught with dangers, if you users start getting clever about what they put in the config file they can cause much trouble. You would probably want to limit them to some simple character sets, perhaps only allow \w\d\s with a line like next if /[^\w\d\s]/ when the config is read in

    update

    That last check for nasty charaters won't work unless you add the comma too ! next if /[^\w\d\s,]/

    Cheers,
    R.

Re: User configurable regex
by erix (Prior) on Oct 20, 2004 at 13:46 UTC

    Here is my two cents' worth. I am not sure if it is what you mean. I have made the regexes deliberately somewhat convoluted :)

    #! /usr/bin/perl -w use warnings; use diagnostics; use strict; eval { action() }; if ($@) { print "error:\n$@\n"; } sub action { my @a = split(/\n/,<<'ENDTEXT'); This is the text we will search through. Normally you'd have some other source like a file. Every line here becomes an element in array @a. ENDTEXT my $rregexes = get_regexes(); # get patterns my @compiled = map qr/$_/ix, @$rregexes; # pre-compile them, # case insensitive print "We got ". @$rregexes ." regexes to test:\n"; # show regexes: for (my $i=0;$i< @$rregexes ;$i++) { print "--------\nregex $i:\n--------\n"; print $rregexes->[$i]; } print "\nHere we go into matching loop:\n"; for (my $i=0;$i<@a;$i++) { for (my $j=0;$j<@compiled;$j++) { if ($a[$i] =~ /$compiled[$j]/) { print "$j match: ".sprintf('%-50s',$a[$i])."<== matched\n"; } else { print "$j no match: ".sprintf('%-50s',$a[$i])."\n"; } } } } sub get_regexes { # # Could also read these from config files! # my @regexes = (); my $i = -1; $regexes[++$i] =<<'REGEXTEXT'; ^ # beginning of string .* # any number of chars normally # .* # $ # end REGEXTEXT $regexes[++$i] =<<'REGEXTEXT'; ^ # beginning of string .* # any number of chars here # .* # element # .* # $ # end REGEXTEXT return \@regexes; }