KStowe has asked for the wisdom of the Perl Monks concerning the following question:

I need some way to cap letters at the beginning of a paragraph and after a hard return? This is what I have to cap letters after a period and a space.
while (<IN>) { s/(\.[\W]+)([\w])/\1\u\2/ig; print OUT; }
Anybody have any ideas? Thanks, I'm new to this site...and man am I glad I found it.

Replies are listed 'Best First'.
Re: Capitalizing letters?
by dimmesdale (Friar) on Jun 21, 2001 at 18:50 UTC
    Let's look over that regex first.
    s/(\.[\W]+)([\w])/\1\u\2/ig;
    The [\W]+ seems akward. If you mean for this to be spaces, why not say spaces, i.e., \s. Also, the brackets to signify a character class are unneccessary here. Also, when constructing a regex, you main goal is to match *only* what you want. What you want in this case is what comes after a period, but here you match that period also. Why not use a zero-width look-behind assertion? (aslo, it seems that you should check for more punctuation than a period). You don't need the char. class brackets on the \w either. Next, the \1 and \2 are a mistake; this is not part of the regex, per se, but rather the substitution, and thus you should say $1 and $2. Also, the /i is useless; i.e., \w's char. class includes both upper- and lower-case letters.

    To capitilize at the beginning of a paragraph, you need to define what a paragraph is. Is it a tab indention? A certain number of spaces? Or is it just preceded by a blank line or two? Once you have that completed, you can then construe a regex to capitilize what you desire. To match a tab you can use \t; or \s to match all types of spaces. To match a paragraph that starts with a blank line or two, why not just put your read in perl's 'paragraph' mode. Then you can capitilize whatever appears first that's letters.

    As for a hard return. . . what exactly do you mean by a hard return?

    Update: Well, for a carraige-return a \r will do; for a newline a \n will do. I would offer a regex to show, but in the other post, now that I look at it, there are some fine ones already shown.

      i think he means as in ^M or carriage reurn
      i think(unless I'm just being thick)

      timbo