in reply to Detecting transpositions

I'm looking for algorithms and/or implementations for comparing two strings and determining if they are the same barring a single transposition of two adjacent characters.

Have you considered either using or borrowing code from Algorithm::Diff? That's how I'd start.

Replies are listed 'Best First'.
Re: Re: Detecting transpositions
by BrowserUk (Patriarch) on Aug 06, 2003 at 08:20 UTC

    I hadn't actually considered Algorithm::Diff as I had assumed it to be line oriented. A brief scan of the docs show that the basic detection method employed is Longest Common Sequence, which isn't really applicable to my problem.

    I've also looked at String::Approx, Text::Levenshtien and Text::Soundex, but again, these aren't geared to detecting a single transposition. They are also incredibly slow if you are trying to do this on thousands of strings.

    I've also coded a simple looping comparison, and played with XOR, but I didn't want to influence the ideas. I have a vague memory of there being a clever way of doing this, but searching the web didn't elicite anything from the vague memory I have.

    So, I thought I'd ask:)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      You can try Text::LevenshteinXS for the need for speed.

      You can also try Text::WagnerFischer and Text::Brew to play with operation's weights/costs to detect letters swap.

      There are also Text::PhraseDistance that is suited for phrases but that can benefit from a custom distance. The 0.01 version has also a dinamic technique (slow), instead the 0.02 solves the issue for the marriage problem.

      </advertising>

        It would be hard to beat Abigail-IIs implementation of comp for performance, and I don't need the full power of Levenshtien for this, but I''l be sure to grab the XS version next time I need it.

        Thanks for the pointers.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.