Changing case inside substitution

skillet-thief has asked for the wisdom of the Perl Monks concerning the following question:

Friendly Monks,

I am having trouble with something I thought would be easy with perl regexes, but though I have been learning lots about perl recently, I have come to you with my first request ever for wisdom.

I have large text files (latex, actually) with commands that look like this: \cite[page 10]{dickens}. Sometimes, however, it might be \cite[page 10]{Dickens}. So I wanted a regex to change Dickens to dickens or Steinbeck to steinbeck, that give me a lower case first letter. (Sorry, there are supposed to be square brackets around "page 10".)

However, none of the simple methods (simple enough for me to know about) seem to work. The first part of the regex is easy:

s{(
    \\cite\[[^]]\]\{
    ([A-Z])
)    
}
[download]

But I'm not sure what to put in the second part. I've tried doing

{$1\l$2}gxe;
[download]

but that doesn't seem to work. So then I tried doing some things with (?{ }) -- using the /e operator, including stuff like this:

sub lower{
    my $letter = shift;
    return lc($letter);
}
s{...snip...}{ $1 (?{ lower( $2 ) } ) }gxe;
[download]

But by this time I started thinking that I was getting way too creative and that there must be a simpler solution.

So beyond figuring out a solution to my problem, I would also be curious to know what is wrong with my understanding of the (?{ }) construction in substitutions.

Many thanks

Monkily yours,

s-t

Edited by castaway, added code tags around latex examples.

Comment on Changing case inside substitution Select or Download Code

Replies are listed 'Best First'.

Re: Changing case inside substitution
by PodMaster (Abbot) on Sep 27, 2003 at 14:13 UTC

quantify

my $input = q'

\cite[page 6]{Dickens}
\cite[page 9]{Dickens}

';

$input =~
s<
    \\cite
    \[
      (
        [^\]]+
       ) # $1
    \]
    {
        (    # $2
         [a-zA-Z]+
         )
    }
>"
\\cite[$1]{\l$2}
"gx;

print $input;
__END__
\cite[page 6]{dickens}


\cite[page 9]{dickens}




use YAPE::Regex::Explain;
die YAPE::Regex::Explain->new(qr<
    \\cite
    \[
      (
        [^\]]+
       ) # $1
    \]
    {
        (    # $2
         [a-zA-Z]+
         )
    }
>x)->explain;
__END__
The regular expression:

(?x-ims:
    \\cite
    \[
      (
        [^\]]+
       )  # $1
    \]
    {
        (     # $2
         [a-zA-Z]+
         )
    }
)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?x-ims:                 group, but do not capture (disregarding
                         whitespace and comments) (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n):
----------------------------------------------------------------------
  \\                       '\'
----------------------------------------------------------------------
  cite                     'cite'
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^\]]+                   any character except: '\]' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \]                       ']'
----------------------------------------------------------------------
  {                        '{'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-zA-Z]+                any character of: 'a' to 'z', 'A' to 'Z'
                             (1 or more times (matching the most
                             amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  }                        '}'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

YAPE::Regex::Explain

cpan

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: Re: Changing case inside substitution

by skillet-thief (Friar) on Sep 27, 2003 at 14:33 UTC

Get YAPE::Regex::Explain from cpan.

very

It works now obviously, ;-) I was missing just some experience with the details of regex syntax, it looks like.

[reply]

Re: Changing case inside substitution
by gjb (Vicar) on Sep 27, 2003 at 14:12 UTC

Using the e regexp modifier, you can execute code in the replacement part. Consider:

  my $str = 'Abc Abc';
  $str =~ s/(\w+)/lc($1)/ge;
[download]

The result would be 'abc abc'.

Hope this helps, -gjb-

[reply]
[d/l]
[select]

Re: Changing case inside substitution
by Not_a_Number (Prior) on Sep 27, 2003 at 14:40 UTC

Unlike other solutions posted, this deals with names such as O'Leary or Le Carré:

s/(cite\[page\d+\]{)([^}]+)/$1\L$2/g;

hth

dave

[reply]
[d/l]

Re: Changing case inside substitution
by tachyon (Chancellor) on Sep 27, 2003 at 14:13 UTC

You are missing the + after your char class to catch the [pageN] bit ie [^]]+ Without that if fails to match.....

s/(\\cite\[[^]]+\]\{)([A-Z])/ $1 . lc($2) /ge;
[download]

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

[reply]
[d/l]
[select]

Re: Changing case inside substitution
by skillet-thief (Friar) on Sep 27, 2003 at 15:41 UTC

[reply]