What kinds of 'errors' are acceptable? Must the length and syntax match but the letters differ? Are missing characters acceptable? Are extra characters acceptable?
Look for the Levenshtein Distance: Text::Levenshtein.
There's also a command-line tool called 'agrep' which may help, even if only to focus your question by reading their documentation and widen your search for other answers.
-- [ e d @ h a l l e y . c c ] | [reply] |
If the general structure is the same, and only the lengths vary, you can use the "at least N but not more than M" construct in regular expressions:
if ($value =~ /\d{1,3}\s{1,2}-\d{3,5}\s{1,4}/)
This will match 1, 2, or 3 digits, then 1 or 2 whitespace, then a hyphen, then 3, 4, or 5 digits, then 1 to 4 whitespace characters. As an aside, in situations like this, the /x modifier is very handy, because it lets you put the comments right in your pattern:
if ($value =~
/\d{1,3} # 1 to 3 digits
\s{1,2} # 1 or 2 whitespace
- # exactly one hyphen
\d{3,5} # 3, 4, or 5 digits
\s{1,4} # 1 to 4 whitespace
/x)
BTW, the pattern you've given doesn't match your sample data. Did you mean \w instead of \s?
| [reply] [d/l] [select] |
I just had another idea. Depending on the size of your search space, you might be interested in the thread Regexp generating strings?. In a reply I gave a program that generates all matching strings for RegExes with certain limitaions. You could extend the program to suit your needs (e.g. add $regex =~ s/\d/'(' . join('|', ('0' .. '9')) . ')'/eg;) and then calculate the minimal distance from your input to the possible matches.
Hope this helped.
CombatSquirrel.
Entropy is the tendency of everything going to hell. | [reply] [d/l] |
Use a lot of ?'s?
------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. | [reply] |
Would that allow me to get a count of how many errors had been introduced, though?
| [reply] |
my $string = "abcd-ef1";
my @matches = $string =~ /^(\w{2})?(\d)?(\w{2})?(-)?(\w{2})?(\d)?$/;
my $errors = grep { !defined $_ } @matches;
print "Errors is $errors\n";
----
Errors is 1
Basically, you're expecting everything to match. If it doesn't, then non-matches are errors. You'll have to fiddle with it, I think, to get it to do exactly what you want, but that should give you a good start. (This is, of course, that you have to (re)invent the wheel.)
------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. | [reply] [d/l] |
my $value = '23DX-445C';
if ($value =~ m!(\d+)(\w+)(-+)(\d+)(\d+)!) {
print "Difference: "
. (abs(length($1) - 2) + abs(length($2) - 2 +
abs(length($3) - 1) + abs(length($4) - 2 +
abs(length($5) - 2))
. "\n";
}
CombatSquirrel.
Entropy is the tendency of everything going to hell. | [reply] [d/l] |