in reply to Re: approximate regular expression
in thread approximate regular expression

This is a very good option (I've tried as you specify and it works). However, I need to put in $pattern a regular expression. This regexp would vary both in nature and size. For instance: /^D{3,12}DJH^F{1,4}../ How could I explore for 'approximate matches to this regexp'?

Replies are listed 'Best First'.
Re^3: approximate regular expression
by AnomalousMonk (Archbishop) on Mar 23, 2012 at 15:42 UTC

    As always, the Devil is in the details. Unless and until you can make clear (first to yourself, then to others) the meaning of the phrase "an approximate match to a regex", you may make but little progress. To me, for instance, the notion of a "regex match" already incorporates vague notions of fuzzyness and approximation. Just what does it mean to add approximation to approximation, or to measure its degree?

      Thanks for the suggestion

      Imagine, for instance, a regexp like this

      $pattern="[^D]{2,3}FGR{2}..H";

      There is not fuzzyness on this. From a list of strings, some will match and some will NOT match this regular expression. The question is: considering ONLY substitutions, allow a maximum of 2 mismatches when evaluating if the string matches or not to $pattern. This will result in the migration of some strings from the 'non-matching group' to the 'matching group'.

      How to do this in Perl?

        I don't get it (I'm a different Anonymous)

        Does this help?

        #!/usr/bin/perl -- use strict; use warnings; my $pattern = "JEJE"; my $string = "EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS"; my $fuzzyPattern = fuzzUpEverySecond( $pattern ); print "$pattern\n$fuzzyPattern\n"; print " $string\n"; while( $string =~ m/($fuzzyPattern)/g ){ print ' ' x ( pos($string) - length($1) -1 ), "^$1\n"; } sub fuzzUpEverySecond { my( $pat ) = @_; $pat =~ s/(.)./$1./g; $pat; } __END__ JEJE J.J. EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS ^JKJU ^JHJD ^JEJE ^JOJO ^JJJA ^JHJS ^JUJE ^JUJK
Re^3: approximate regular expression
by JavaFan (Canon) on Mar 23, 2012 at 12:45 UTC
    Unless you write your regular expression engine, and use the plugin system to use it, regular expressions are really the wrong way to attack this.