in reply to Ignoring patterns when comparing strings

Without looking into it any deeper, a simple approach is to stop recalculating the "cleaned" version of a string for each comparison. For that, you will need to move the cleaning out of the function compare and up in to the loop calling compare:

for my $string1 (@strings_left) { my $string1_clean = clean( $string1 ); for my $string2 (@strings_right) { if( compare($string1_clean, clean( $string2 )) { ... } }; };

If you have some more memory, you can Memoize the cleanup of the string. This would speed up cleaning up strings a bit more.

But maybe you can save more comparison time by first sorting all your strings into buckets based on the first (few) characters of the string. There is no way that a string starting with "A" will be equal to a string starting with "B". That could cut down on the total number of comparisons made.

Replies are listed 'Best First'.
Re^2: Ignoring patterns when comparing strings
by TravelAddict (Acolyte) on Jun 28, 2018 at 12:48 UTC
    Thanks, Corion. Your reply made me realize that I'm actually doing the substitutions multiple times on the same strings. If I have a list of 10 strings, I compare the first one with the second one, then the third, etc. until the end, then repeat starting with the second string and compare it to the third, and so on. So instead of doing the substitution multiple times, I'll do it once, store the results in a new variable used for comparison only, and I'll use the real value when it's time to write something. Brilliant! I could also probably improve the algorithm itself, but if I stop doing the substitution multiple times on the same string, I should get an acceptable execution time. And thanks for the tip about Memoize! I'm sure that this one will be handy in some other cases. Have a great day! TA