Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering if anyone knows of any perl modules that will identify obvious abbreviations when given a list of words.

For example: IBM, CLF, ESR, MS, ST.

I'm not a linguist so I don't have any idea of how to search for this so any starting point would be helpful.

  • Comment on Determining if a word is really an abbreviation

Replies are listed 'Best First'.
Re: Determining if a word is really an abbreviation
by Joost (Canon) on Jun 15, 2004 at 19:58 UTC
      Thanks for pointing this out to me! I'll start looking at it.

      Someone once told me that there are certain rules of thumb for determining if something is pronounceable.

      For example: If no vowels it's an acronym/abbreviation. Certain letter combinations are never seen for a real word... I wish there was a way to find this rulebase.

        If no vowels it's an acronym/abbreviation.

        cwm is the exception to prove your rule...

Re: Determining if a word is really an abbreviation
by Zaxo (Archbishop) on Jun 16, 2004 at 02:58 UTC

    vera is a text database of acronyms available at gnu.org. It would be pretty easy to take the initial character of your candidate acronym, lowercase it and search the vera.? file for a match.

    my $meaning; if ( /\b([A-Z]?)\b/ ) { open my $fh, '<', '/path/to/vera/vera.' . lc( substr $1, 0, 1) or die $!; while (<$fh>) { $meaning = <$fh> and last if /$1/; } }
    If you want to do a lot of that, it would probably pay to set up a simple database of the vera data. Perl can do that for you, too.

    After Compline,
    Zaxo

Re: Determining if a word is really an abbreviation
by BigLug (Chaplain) on Jun 16, 2004 at 05:59 UTC

    The problem with any/all solutions to this problem is that of acceptable usage. For example, there are technical documents who would refer to SCUBA but travel brochures that would refer to scuba. Similarly scientists work with LASER but companies sell laser devices.

    You cant create rules about pronouncability because both 'laser' and 'scuba' are pronouncable. 'Sky' contains no vowels and so might/might not be an acronym. Qantas contains no 'u' after the Q, nor does Iraq. Qantas is the "Queensland and Northern Territory Aerial Service" however, noone uses that outside of a trivia game these days. Its considered a word. Iraq, of course, is from another language and doesn't follow English rules.

    Databases like vera can help you with a list of known acronyms but in Real Estate advertising 'LUG' is a lock-up-garage whereas in a mechanical journal a 'lug' is a type of nut.

    Given all this, when parsing user text I normally require that there be one or more lower-case letters in users input. That way I know they didn't just type it with the Caps-Lock button on. If it's all in capitals I'll ask them to change or confirm what they've entered.

    "Get real! This is a discussion group, not a helpdesk. You post something, we discuss its implications. If the discussion happens to answer a question you've asked, that's incidental." -- nobull@mail.com in clpm
Re: Determining if a word is really an abbreviation
by dragonchild (Archbishop) on Jun 15, 2004 at 19:48 UTC
    Those are acronyms, not abbreviations. And, acronyms tend to be all uppercase and also tend to not be words (though there are some very obvious exceptions, especially with acronyms that are chosen to be words.)

    Good luck.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      Um, you have that backwards. An acronym can be pronounced and is accepted as a word: RADAR, LIDAR, SONAR, SCUBA, PATRIOT, and so on. IBM is merely an abbreviation (or initialism) because you say "eye bee em."

      --
      [ e d @ h a l l e y . c c ]

        The dictionary definition (external link) of "abbreviate" means simply "to make shorter". From that definition, I would say that an acronym is a specific type of abbreviation.

        The dictionary defintion (another external link) of "acronym" could be interpreted to mean what you say, but I don't think it has to be. The defintion given above says nothing about being able to pronounce the series of letters, just that it is a word formed by the first letters of the word series (or parts of the word series), which would include "IBM".

        I typicaly think of an abbreviation as a shortening of a single word, such as "Inc."

        ----
        send money to your kernel via the boot loader.. This and more wisdom available from Markov Hardburn.

        I'm backing Halley on this one, and strongly disagreeing with hardburn and dragonchild.
Re: Determining if a word is really an abbreviation
by andyf (Pilgrim) on Jun 16, 2004 at 19:12 UTC
    Afaik the correct definition is given by Halley, the letters must form a word. Merely being pronounced like a word, for example SCSI isn't sufficient. There are also 'contracions' which are one step short of abbreviations, for example COINTEL. Technical English says an abbreviation should be written with periods as separators, for example I.B.M. or with an appended period as in Mr. and Ms.
    The only answer to catching and mapping all these shortened forms is a database/list of some kind. The Oxford dictionary used to have several pages devoted to them, but I think these days there are so many from specialised fields that no-one bothers to compile them anymore. btw Lug, also = Linux User Group and an kind of knot. My most hated confusing acronym is LOD in games design which is 'level of detail' and 'last object drawn' in the engine and 'land object down' in terrain building.