Reading Regexes from File

jeffthewookiee has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading Regexes from File by johngg (Canon) on Apr 11, 2007 at 22:28 UTC
You might want to explore using a hash with the regular expression as the key and the replacement text as the value. You can use compiled regular expressions as hash keys. You might need to consider using `\Q` and `\E` or quotemeta when compiling the regular expressions. In the script below I read the patterns and corresponding replacements from the `DATA` file at the end of the script and the patterns contain no spaces so a simple `split` on white space is sufficient. Your patterns may be more complex. `use strict; use warnings; my %substitutions = (); while ( <DATA> ) { chomp; my ($pattern, $replace) = split; my $rxPattern = qr{(?si)$pattern}; $substitutions{$rxPattern} = $replace; } my $str = q{a capital M and Fish}; print qq{$str\n}; $str =~ s{$_}{$substitutions{$_}} for keys %substitutions; print qq{$str\n}; __END__ \sM\s MALE \sF\s FEMALE` [download] The output produced is `a capital M and Fish a capitalMALEandFEMALEish` [download] I hope this is of use. Cheers, JohnGG	[reply] [d/l] [select]
Re^2: Reading Regexes from File by jeffthewookiee (Sexton) on Apr 12, 2007 at 18:43 UTC
OK, tried a slightly different approach. Still reading the regular expressions from file, but I'm using a :: to delimit the search pattern from the replacement text. Problem is, it's literally interpreting the replacement. For example this in the file: (\d{4})(\d{2})(\d{2})::$2-$3-$1 When run this way in the code: my ($search, $replace) = split "::", $rule; $value =~ s/$search/$replace/si; Yields - $2-$3-$1 I wanted it to refactor the date into a MM-DD-YYYY format.	[reply]
Re^3: Reading Regexes from File by johngg (Canon) on Apr 12, 2007 at 22:57 UTC
I don't think that is ever going to work, although more experienced Monks may know better. The method I described is only good for replacing a pattern with simple text. You are getting the literal $2-$3-$1 because the perl interpreter sees the scalar `$replace` when it parses your code and interpolates it's contents as a literal string rather than seeing the "magical to regular expressions" `$1` etc. If you try to rectify things by `eval`'ing `$replace`, i.e `$value =~ s/$search/eval "$replace"/sie` then Perl interprets that as a sum and comes up with, for today's date, -2015. To do anything more fancy than a simple text replacement you will almost certainly have to take the `eval` approach suggested by duff and expanded on by ikegami. Cheers, JohnGG	[reply] [d/l] [select]
Re: Reading Regexes from File by duff (Parson) on Apr 11, 2007 at 19:24 UTC
Your examples aren't regular expressions though they contain them. What you've got is a file full of substitutions. For that you probably want to use eval duff	[reply]
Re^2: Reading Regexes from File by Anonymous Monk on Apr 11, 2007 at 19:34 UTC
Isn't a substitution a regex? I'm not sure I understand what you're trying to tell me to do.	[reply]
Re^3: Reading Regexes from File by ikegami (Patriarch) on Apr 11, 2007 at 19:47 UTC
No. `\sM\s` is a regexp. `s/\sM\s/MALE/si` is Perl code (consisting of a substitution operator with a regexp, a replacement string and some options for operands). To run Perl code, you need to use `eval EXPR`. A simplistic example: `my $str = 'Hello World!'; # Alias $_ to $str. foreach ($str) { # Assumes each substitute operator is on a different line. while (defined(my $code = <DATA>)) { eval($code) or die("Bad code at input line $.: $@\n"); } } print("$str\n"); # [hello world!] __DATA__ s/^/[/g s/$/]/g s/([A-Z])/lc($1)/eg` [download] An example allowing the reuse of the substitutions: `my @substitutions; # Assumes each substitute operator is on a different line. while (defined(my $code = <DATA>)) { push @substitutions, eval("sub { $code }") or die("Bad code at input line $.: $@\n"); } my $str1 = 'Hello World!'; my $str2 = 'Good Day!'; # Alias $_ to the variable. foreach ($str1, $str2) { foreach my $substitution (@substitutions) { $substitution->(); } } print("$str1\n"); # [hello world!] print("$str2\n"); # [good day!] __DATA__ s/^/[/g s/$/]/g s/([A-Z])/lc($1)/eg` [download]	[reply] [d/l] [select]