Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Here's another vote for Text::Levenshtein which I have found very handy for comparing strings (mostly detecting data entry errors), especially those with mixed letters and numbers, though I too wish I could get the XS version working.

I'd also like to point out Text::Metaphone as a soundex on steroids, as I've found soundex to be too insensitive at times. Note however that all but letters are ignored by Metaphone, which may limit its usefulness to you.

I think BrowserUk points out a serious problem in the case of MP3 files, but as most cases I've seen use some sort of fairly standard separators between "fields" in the filename, you could split each name into fields, then do the comparisons between two MP3 names on all possible pairings, selecting the best match as the most likely set of pairings. This will of course be much slower than comparing the entire name, but there are probably only 3 or 4 fields per name so you shouldn't be looking at run times greater than the lifetime of the universe either.

--
I'd like to be able to assign to an luser


In reply to Re: similar texts !? by Albannach
in thread similar texts !? by bugsbunny

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-25 07:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found