skillet-thief has asked for the wisdom of the Perl Monks concerning the following question:

Friendly Monks,

I am having trouble with something I thought would be easy with perl regexes, but though I have been learning lots about perl recently, I have come to you with my first request ever for wisdom.

I have large text files (latex, actually) with commands that look like this: \cite[page 10]{dickens}. Sometimes, however, it might be \cite[page 10]{Dickens}. So I wanted a regex to change Dickens to dickens or Steinbeck to steinbeck, that give me a lower case first letter. (Sorry, there are supposed to be square brackets around "page 10".)

However, none of the simple methods (simple enough for me to know about) seem to work. The first part of the regex is easy:

s{( \\cite\[[^]]\]\{ ([A-Z]) ) }

But I'm not sure what to put in the second part. I've tried doing

{$1\l$2}gxe;

but that doesn't seem to work. So then I tried doing some things with (?{ }) -- using the /e operator, including stuff like this:

sub lower{ my $letter = shift; return lc($letter); } s{...snip...}{ $1 (?{ lower( $2 ) } ) }gxe;

But by this time I started thinking that I was getting way too creative and that there must be a simpler solution.

So beyond figuring out a solution to my problem, I would also be curious to know what is wrong with my understanding of the (?{ }) construction in substitutions.

Many thanks

Monkily yours,

s-t

Edited by castaway, added code tags around latex examples.

Replies are listed 'Best First'.
Re: Changing case inside substitution
by PodMaster (Abbot) on Sep 27, 2003 at 14:13 UTC
    You need to quantify
    my $input = q' \cite[page 6]{Dickens} \cite[page 9]{Dickens} '; $input =~ s< \\cite \[ ( [^\]]+ ) # $1 \] { ( # $2 [a-zA-Z]+ ) } >" \\cite[$1]{\l$2} "gx; print $input; __END__ \cite[page 6]{dickens} \cite[page 9]{dickens} use YAPE::Regex::Explain; die YAPE::Regex::Explain->new(qr< \\cite \[ ( [^\]]+ ) # $1 \] { ( # $2 [a-zA-Z]+ ) } >x)->explain; __END__ The regular expression: (?x-ims: \\cite \[ ( [^\]]+ ) # $1 \] { ( # $2 [a-zA-Z]+ ) } ) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?x-ims: group, but do not capture (disregarding whitespace and comments) (case-sensitive) (with ^ and $ matching normally) (with . not matching \n): ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- cite 'cite' ---------------------------------------------------------------------- \[ '[' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^\]]+ any character except: '\]' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- \] ']' ---------------------------------------------------------------------- { '{' ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- [a-zA-Z]+ any character of: 'a' to 'z', 'A' to 'Z' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- } '}' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    Get YAPE::Regex::Explain from cpan.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Get YAPE::Regex::Explain from cpan.

      I will, that looks very useful. Thanks for the detailed explanation.

      It works now obviously, ;-) I was missing just some experience with the details of regex syntax, it looks like.

      s-t
Re: Changing case inside substitution
by gjb (Vicar) on Sep 27, 2003 at 14:12 UTC
    Using the e regexp modifier, you can execute code in the replacement part. Consider:
    my $str = 'Abc Abc'; $str =~ s/(\w+)/lc($1)/ge;
    The result would be 'abc abc'.

    Hope this helps, -gjb-

Re: Changing case inside substitution
by Not_a_Number (Prior) on Sep 27, 2003 at 14:40 UTC

    Unlike other solutions posted, this deals with names such as O'Leary or Le Carré:

    s/(cite\[page\d+\]{)([^}]+)/$1\L$2/g;

    hth

    dave

Re: Changing case inside substitution
by tachyon (Chancellor) on Sep 27, 2003 at 14:13 UTC

    You are missing the + after your char class to catch the [pageN] bit ie [^]]+ Without that if fails to match.....

    s/(\\cite\[[^]]+\]\{)([A-Z])/ $1 . lc($2) /ge;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Changing case inside substitution
by skillet-thief (Friar) on Sep 27, 2003 at 15:41 UTC
    Thanks everybody, I've already learned a ton.
    s-t