KStowe has asked for the wisdom of the Perl Monks concerning the following question:

I need some way to cap letters at the beginning of a paragraph and after a hard return? This is what I have to cap letters after a period and a space.
while (<IN>) { s/(\.[\W]+)([\w])/\1\u\2/ig; print OUT; }
Anybody have any ideas? Thanks, I'm new to this site...and man am I glad I found it.

Replies are listed 'Best First'.
Re: Capitalizing letters? (boo)
by boo_radley (Parson) on Jun 21, 2001 at 18:35 UTC
    This will capitalize the first letter after one hard return.
    $_= " This is a paragraph here too."; s/(\n\w)/uc($1)/eg; print;
    This will capitalize the first word after one hard return.
    $_= " This is a paragraph here too."; s/(\n\w+)/uc($1)/eg; print;

    Also, I can't find any reference to a \u escape sequence. Can someone help me understand what that should do, if anything?

      The \u escape sequence is documented in perlop:

      The following escape sequences are available in constructs that interpolate but not in transliterations. \l lowercase next char \u uppercase next char \L lowercase till \E \U uppercase till \E \E end case modification \Q quote non-word characters till \E
Re: Capitalizing letters?
by Hofmator (Curate) on Jun 21, 2001 at 18:52 UTC

    First your regex can be simplified to

    s/(\.\W+)(\w)/$1\u$2/g;

    you don't need the square brackets around \w and \W as they behave just like a normal letter with respect to the quantifiers (+,*,?,{}). Furthermore the /i modifier for case insensitive matching is not necessary as you don't use any cases in your search pattern.

    To match newlines ('hard returns') it is easiest to slurp in the whole file and then match and replace the \n's explicitly, the (?<=) construct is a positive look behind assertion (see perlre):

    open IN, "< infile" or die "couldn't open infile: $!"; undef $/; $_ = <IN>; s/(?<=\n\s*)(\w)/\u$1/g; print OUT;

    With a paragraph being defined as two newlines right after each other (allowing for whitespaces on the 'empty' line) this can be included in the regex as follows:

    s/(?<=\n\s*\n\s*)(\w)/\u$1/g;

    Update: after reading dimmesdale's answer (in the other thread with the same name) I used look behind assertions to simplify my regexes

    -- Hofmator