Quick regexp question

rkg has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Quick regexp question by liz (Monsignor) on Oct 05, 2003 at 13:55 UTC
`$a =~ s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg; ^^^ slash removed` [download] You realize of course that there many other (probably better) ways to do this. Particularly, I wonder why you would need the \b, as `[A-Z]{4,}` is greedy and will take whole words on a match. And then there is of course the \L escape sequence. I would probably have done it like this: `$a =~ s#([A-Z])([A-Z]{3,})#$1\L$2\E#g;` [download] Liz	[reply] [d/l] [select]
Re: Re: Quick regexp question by bart (Canon) on Oct 05, 2003 at 14:01 UTC
If you're going to better him on his code, you might just as well do it the proper way. Eh... the shorter way. :) `$a =~ s#([A-Z]{4,})#\u\L$1#g;` [download]	[reply] [d/l]
Re: Re: Re: Quick regexp question by liz (Monsignor) on Oct 05, 2003 at 16:00 UTC
To be honest, I didn't know you could chain \u\L that way. Another "nice idiom learned at the Monastery" for me today! Thanks, bart! Liz	[reply]
Re: Quick regexp question by Abigail-II (Bishop) on Oct 05, 2003 at 22:01 UTC
Without the \b, you'd change `fooBARBAZ` into `fooBarbaz`. With the \b, you'd leave it unmodified. Abigail	[reply]
Re: Quick regexp question by arno (Scribe) on Oct 05, 2003 at 13:56 UTC
Hi, your code is almost perfect, you've just put a '/' at the end of your pattern : `$a =~ s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg;` [download] will work Arnaud	[reply] [d/l]
Re: Quick regexp question by rkg (Hermit) on Oct 05, 2003 at 13:55 UTC
do'h! a vestigal slash hidden in pattern. `s{\b([A-Z]{4,})\b}{ucfirst(lc($1))}eg;` is what i want. thanks, got it.	[reply] [d/l]
Re: Quick regexp question by Cody Pendant (Prior) on Oct 05, 2003 at 23:31 UTC
Just a thought, your code is going to make things like "CIA" and "NASA" come out as "Cia" and "Nasa" -- is that OK? `($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print` [download]	[reply] [d/l]
Re: Re: Quick regexp question by rkg (Hermit) on Oct 06, 2003 at 02:32 UTC
Yep, imperfect, but it is the best I can do given the large volume of text I need to process, and meets the level of accuracy of the spec. <humor> Maybe perl 6 will offer new flavors of `\b` word boundary tags which'll be smart enough to recognize the start of well-know acronyms...</humor>	[reply] [d/l]
Re: Re: Re: Quick regexp question by Cody Pendant (Prior) on Oct 06, 2003 at 02:58 UTC
You could always define a hash of any acronyms that are likely to be there (download a list from somewhere) and put "unless defined $dontchange{$word}" before your regex. `($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print` [download]	[reply] [d/l]