mikeraz has asked for the wisdom of the Perl Monks concerning the following question:

On a Linux mailing list near me a user asked:

I have a list of company names all in upper case; one per line. Can I use a combination of sed, tr, and other tools to automate converting them all to mixed case (i.e., initial letter uppercase, all others lower case)? There must be a shell script already written to do this, but not in the reference books I have.

I immediately blasted out a knee jerk reply:

perl -ne 'print ucfirst lc;' FILENAME will do it.

And then proceeded to read the answers others had provided.

Some suggestions caused me to wonder where the virtue of Laziness is hiding:

perl -pe 'tr/A-Z/a-z/; s/(\S+)/\u$1/g;' <filename>

Others reminded me to think before posting

My sed-fu was deficient I guess (or my sed was) so I fell back to p +erl: If you want word-case perl -pe 's/ \b (\w) ([^\s]+) \b /\1\L\2/gx' # ONE COMPANY -> One Company (rather than One company)

Oh, yes, multi word company names. How ... almost all the blathering time.

So I went back to redo my original and came up with:

perl -i -ne 's/(\b\w)/{uc $1}/eg; print;' <FILENAME>

Which I'm content with. But...can you come up with something better?

There is one nit I have with that solution. RUN4LIFE translates to Run4life rather than Run4Life. My reading of `man perlre` doesn't turn up a \ code for alpha only. The systems I have available to me at the moment don't support [:class:] for trying that out.

Be Appropriate && Follow Your Curiosity

Replies are listed 'Best First'.
Re: Case Munging
by johngg (Canon) on Apr 16, 2009 at 21:52 UTC

    You could split on anything that wasn't an uppercase letter ([^A-Z]), capturing what you split on, do your ucfirst lc in a map then concatenate the results with join.

    $ cat companies WOOLWORTHS CAP GEMINI AVIVA AMERADA HESS RUN4LIFE $ perl -pi.bak -e '$_ = join q{}, map { ucfirst lc } split m{([^A-Z]+) +};' companies $ cat companies Woolworths Cap Gemini Aviva Amerada Hess Run4Life $

    I hope this is useful.

    Cheers,

    JohnGG

Re: Case Munging
by almut (Canon) on Apr 16, 2009 at 21:22 UTC
    My reading of `man perlre` doesn't turn up a \ code for alpha only. The systems I have available to me at the moment don't support [:class:]

    I think you should be able to use the character class [^\W\d_] as a substitute (which should support locales and unicode — as opposed to homebrewn ranges like [A-Za-z]).

Re: Case Munging
by morgon (Priest) on Apr 16, 2009 at 21:52 UTC
    What about this:
    perl -i.old -pe 's/(?<![a-z])(\w)/uc $1/eg' <FILENAME>
    After this runs you have all "translated" names in <FILENAME>. The old version of the file is kept as "<FILENAME>.old".
    And it translates run4life to Run4Life as requested.

    And in case the regex is unclear: The first part is a negative lookahead (see perldoc perlre).

      It just occurs to me that you said the file holds the names in upper cases (my previous post assumed lower-case).

      So use this instead:

      perl -i.old -pe 's/(?<=[A-Z])(\w)/lc $1/eg' <FILENAME>
      And the regex contains a (this time positive) lookbehind-assertion (and not lookahead as I wrote above).

      Sorry for the confusion.

        FWIW, the following may give a slight improvement in speed, simplicity or generality (I will not bother to put it into file-processing form):
        >perl -wMstrict -le "for (@ARGV) { s{ (?<= [[:upper:]]) ([[:upper:]]+) }{\L$1}xmsg; print; } " "XYZZY" "RUN4LIFE" "RUN42LIFE CO" "GENERAL WIDGET CO., INTL" "MI-GO BRAIN CYLINDERS, IPTY." Xyzzy Run4Life Run42Life Co General Widget Co., Intl Mi-Go Brain Cylinders, Ipty.
        Supposed advantages:
        • The  ([[:upper:]]+) quantified capture grabs as much as possible of whatever is to be lower-cased (rather than capturing a character at a time);
        • The use of the  \L interpolation modifier eliminates the need for evaluation in the regex (i.e., no  //e regex modifier);
        • The use of  [[:upper:]] and  \L bring locale fully into play (I think).
        Good clarification for the archives. I'd mentally prefaced `lc <INPUT>` since that was a necessary step to get to the end state.
        Be Appropriate && Follow Your Curiosity
      I like that, just the kind of thing I was fumbling towards.
      Be Appropriate && Follow Your Curiosity
Re: Case Munging
by Bloodnok (Vicar) on Apr 17, 2009 at 14:17 UTC
    Hmmm ,

    An exercise in sed(1) for the interested reader, I'm game ...

    sed 's,.*,\L&,; s,.,\u&,; s,[ 0-9][^ ],\U&,g' file ^ ^ ^ | | | | | | | | Capitalise any non-space char after a + space or a numeric | Uppercase the first char Lower case the whole name
    echo " FIRST SECONF THRID RUN4LIFE" | sed 's,.*,\L&,; s,.,\u&,; s,[ 0-9][^ ],\U&,g' First Seconf Thrid Run4Life
    A user level that continues to overstate my experience :-))