http://qs1969.pair.com?node_id=471329


in reply to Re^6: Capitalize First Letter of Each Word
in thread Capitalize First Letter of Each Word

Well it isn't home work, just trying possibilities with perl regular expression, kind of training myself, nothing wrong with that, and asking those who know better than you right? Sure why not unless there is a better way to do it?

Replies are listed 'Best First'.
Re^8: Capitalize First Letter of Each Word
by waswas-fng (Curate) on Jun 30, 2005 at 17:35 UTC
    Well, without limiting scope on your uc/lc rules you are in for a world of hurt. Because, while you can put a large list of known acronyms in your code and special case them, the text could a Randomly Made Up Acronym (RMUA) that is new to you. This turns out to be a very hard problem to do well -- I know a few guys at MS who do some of the backend work on Word's grammar engine and they have spent a ton of time on this very issue. Either you settle for a known but limit use or you drop thousands of lines of code on the problem and still end up with something that is not perfect. I think in word they just assume and group of capped letters is an acronym (nix some special cases), and if all words in a sentence are Capped then bypass acronym rules altogether. An example for how ugly this can be is:
    Some of the known acronyms for THE. THE Technische Hogeschool Eindhoven THE Technological Horizons in Education THE Tennessee Hospitality Education (Council) THE Teresina, Piaui, Brazil - Teresina (Airport Code) THE The Humane Environment (Jef Raskin) THE Theatre THE Toronto Health Economics (Network) THE Transportable Helicopter Enclosure
    !-- Node text goes above. Div tags should contain sig only -->


    -Waswas
Re^8: Capitalize First Letter of Each Word
by ww (Archbishop) on Jun 30, 2005 at 15:01 UTC
    re why not unless...?

    If merely gaining the ability to write scripts is your goal, your approach poses no problem at all. However, if building scripts that compile and appear to work (on a limited sample of data) is the extent of your ambition, you'll be missing a bet. One of the most highly valuable aspects of the Monastery is that you can call for help upon a community whose population includes some extraordinarily skilled programmers and whose topic is a wonderfully versatile language.

    High level programmer skills go well beyond syntax.

    If your data set is narrowly constrained, and you won't have to do too many s/// (where the value of "many" is "small" or "tolerable to you") that's a perfectly valid approach.

    Answering that (for yourself; I can't without a better grasp of the data set) requires analysis:

    • What are the elements in your data that must retain all UC form?
    • Is retaining the UC form really required?
      "Ltd." (mixed case with period/fullstop to denote abbreviation) seems quite as comprehensible as "LTD" -- unless your transformation is a step toward some other, later process which requires the form "LTD" (say, during the use of a program you can't modify).

    As for the rest of your questions... go for it! I too tend to learn in exactly that mode.

    However, as a favor to yourself AND to those who seek to help, please look hard at the preview of your posts for typos and/or language errors (for ex., "uncial" and "on" where "in" appears to be intended, above) that may obscure your meaning.