in reply to String similarities and pattern matching

Crackers2 is correct,

I believe you're missing the OP's point.

I did completely miss the OP's point.

On short strings like these, the time it will take to determine the degree of similarity far outweighs the time taken simply use the regex. I see no benefit in avoiding running the match?
#! perl -slw use strict; my @regex = ( qr[Error 123 on (\S+) file not found error], qr[Error 124 on (\S+):(\S+) no space left], ); while( <DATA> ) { chomp; for my $regex ( @regex ) { print "'$_' ", ( $_ =~ $regex ? 'does ' : 'does not' )," ma +tch $regex"; } } __DATA__ Error 123 on SystemA file not found error Error 123 on SystemB file not found error Error 123 on SystemC file not found error Error 124 on User1:FileA no space left Error 124 on User2:FileB no space left Error 124 on User3:FileC no space left

Outputs:

c:\test>junk9 'Error 123 on SystemA file not found error' does match (?-xism:Err +or 123 on (\S+) file not found error) 'Error 123 on SystemA file not found error' does not match (?-xism:Err +or 124 on (\S+):(\S+) no space left) 'Error 123 on SystemB file not found error' does match (?-xism:Err +or 123 on (\S+) file not found error) 'Error 123 on SystemB file not found error' does not match (?-xism:Err +or 124 on (\S+):(\S+) no space left) 'Error 123 on SystemC file not found error' does match (?-xism:Err +or 123 on (\S+) file not found error) 'Error 123 on SystemC file not found error' does not match (?-xism:Err +or 124 on (\S+):(\S+) no space left) 'Error 124 on User1:FileA no space left' does not match (?-xism:Error +123 on (\S+) file not found error) 'Error 124 on User1:FileA no space left' does match (?-xism:Error +124 on (\S+):(\S+) no space left) 'Error 124 on User2:FileB no space left' does not match (?-xism:Error +123 on (\S+) file not found error) 'Error 124 on User2:FileB no space left' does match (?-xism:Error +124 on (\S+):(\S+) no space left) 'Error 124 on User3:FileC no space left' does not match (?-xism:Error +123 on (\S+) file not found error) 'Error 124 on User3:FileC no space left' does match (?-xism:Error +124 on (\S+):(\S+) no space left)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: String similarities and pattern matching
by Crackers2 (Parson) on Sep 08, 2006 at 00:20 UTC

    I believe you're missing the OP's point. The regexes are not part of the given data; in fact they're the desired answer.

    I.e. given the data

    Error 123 on SystemA file not found error Error 123 on SystemB file not found error Error 123 on SystemC file not found error

    the task is to come up with a regex that matches these, using as little variables as possible.

    That's how I read it anyway