This is what I did -- I'm sorry I don't have any code, it's currently hidden away on same publisher's system.

1) Took the Spanish-English dictionary, munged the spanish translations into translator's notes for the Catalan team, and then hashed each entry against its headword, such that $hash{'headword'} would contain the complete entry text.

2) For each word in the Catalan list:
a) checked to see if there was an exact match in the hash keys to a Spanish headword; if not:
b) tried to apply each one of a list of ending heuristics; if one of these matched exactly, use it, else try a fuzzy match.

I seem to remember that String::Approx returned a list of possible matches from keys(%hash). I used the simple expedient of using the first one it returned. There were probably better ways of doing it, but this seemed to be adequate.

Sorry for lack of code. If it's any consolation, I remember what I called the program: The Hortalizer. That's because hortalissa is Spanish for vegetable, while the Catalan is hortaliza. I even put in one guess, erm heuristic, just to catch this word.

--
foreach(split('',"\3\3\3c>\0>c\177cc\0~c~``\0cc\177cc")) {$_=unpack('B8',$_);tr,01,\40#,;print$_,"\n";}##IYDKINT!


In reply to Re: Re: How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet by Willard B. Trophy
in thread How I Created a Catalan-English Dictionary from a Spanish-English Dictionary Using Only String::Approx and Approximately 500 grams of Scots Tablet by Willard B. Trophy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.