in reply to Re^2: Generate a regex from text
in thread Generate a regex from text
Not sure if this is of any help but I offer it anyway !. Basic idea is to build patterns of characters/numbers and count them to show the most common. I've applied it to all your data but you would probably want to do each column separately
poj#!perl use strict; my %pattern = (); my @data = <DATA>; # indentify patterns for (@data){ chomp; tr/A-Za-z/A/; tr/0-9/9/; $_ =~ s/A+/[A-Za-z]+/g; $_ =~ s/9+/[0-9]+/g; ++$pattern{$_}; # count duplicates } # results for (sort keys %pattern){ print "$pattern{$_} $_\n"; } # build regex my $re = join '|',map { quotemeta } sort keys %pattern; # check data against regex for (@data){ unless (/^($re)$/){ print "No match $_\n"; } } __DATA__ John DoE ABC123 1-233-123-4562 Jo M. Doeson abd123 (222)222-2222 Mc'Doe, Jim abd123 222-222-2222 MCDOE, JAN E. abd1243 (222)222-2ab2
|
|---|