jeffthewookiee has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing a program that reads a list of regular expressions from file. For example: s/\s*M\s*/MALE/si s/\s*F\s*/FEMALE/si This is designed to provide a configurable list of regular expressions to apply to later data. Later on in the program I'm applying the rules like so: $value =~ $regex; Unfortunately, nothing is happening. Am I attempting to use the regexes from file the wrong way? Do I need to do something else to the list of regexes before Perl will treat them as it would if I had typed them in to a Perl script?

Replies are listed 'Best First'.
Re: Reading Regexes from File
by johngg (Canon) on Apr 11, 2007 at 22:28 UTC
    You might want to explore using a hash with the regular expression as the key and the replacement text as the value. You can use compiled regular expressions as hash keys. You might need to consider using \Q and \E or quotemeta when compiling the regular expressions. In the script below I read the patterns and corresponding replacements from the DATA file at the end of the script and the patterns contain no spaces so a simple split on white space is sufficient. Your patterns may be more complex.

    use strict; use warnings; my %substitutions = (); while ( <DATA> ) { chomp; my ($pattern, $replace) = split; my $rxPattern = qr{(?si)$pattern}; $substitutions{$rxPattern} = $replace; } my $str = q{a capital M and Fish}; print qq{$str\n}; $str =~ s{$_}{$substitutions{$_}} for keys %substitutions; print qq{$str\n}; __END__ \s*M\s* MALE \s*F\s* FEMALE

    The output produced is

    a capital M and Fish a capitalMALEandFEMALEish

    I hope this is of use.

    Cheers,

    JohnGG

      OK, tried a slightly different approach. Still reading the regular expressions from file, but I'm using a :: to delimit the search pattern from the replacement text. Problem is, it's literally interpreting the replacement. For example this in the file:

      (\d{4})(\d{2})(\d{2})::$2-$3-$1

      When run this way in the code:

      my ($search, $replace) = split "::", $rule; $value =~ s/$search/$replace/si;

      Yields - $2-$3-$1

      I wanted it to refactor the date into a MM-DD-YYYY format.
        I don't think that is ever going to work, although more experienced Monks may know better. The method I described is only good for replacing a pattern with simple text. You are getting the literal $2-$3-$1 because the perl interpreter sees the scalar $replace when it parses your code and interpolates it's contents as a literal string rather than seeing the "magical to regular expressions" $1 etc. If you try to rectify things by eval'ing $replace, i.e $value =~ s/$search/eval "$replace"/sie then Perl interprets that as a sum and comes up with, for today's date, -2015.

        To do anything more fancy than a simple text replacement you will almost certainly have to take the eval approach suggested by duff and expanded on by ikegami.

        Cheers,

        JohnGG

Re: Reading Regexes from File
by duff (Parson) on Apr 11, 2007 at 19:24 UTC

    Your examples aren't regular expressions though they contain them. What you've got is a file full of substitutions. For that you probably want to use eval

      Isn't a substitution a regex? I'm not sure I understand what you're trying to tell me to do.
        No. \s*M\s* is a regexp. s/\s*M\s*/MALE/si is Perl code (consisting of a substitution operator with a regexp, a replacement string and some options for operands). To run Perl code, you need to use eval EXPR.

        A simplistic example:

        my $str = 'Hello World!'; # Alias $_ to $str. foreach ($str) { # Assumes each substitute operator is on a different line. while (defined(my $code = <DATA>)) { eval($code) or die("Bad code at input line $.: $@\n"); } } print("$str\n"); # [hello world!] __DATA__ s/^/[/g s/$/]/g s/([A-Z])/lc($1)/eg

        An example allowing the reuse of the substitutions:

        my @substitutions; # Assumes each substitute operator is on a different line. while (defined(my $code = <DATA>)) { push @substitutions, eval("sub { $code }") or die("Bad code at input line $.: $@\n"); } my $str1 = 'Hello World!'; my $str2 = 'Good Day!'; # Alias $_ to the variable. foreach ($str1, $str2) { foreach my $substitution (@substitutions) { $substitution->(); } } print("$str1\n"); # [hello world!] print("$str2\n"); # [good day!] __DATA__ s/^/[/g s/$/]/g s/([A-Z])/lc($1)/eg