Capitalization and Regex

pekkhum has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Capitalization and Regex by Abigail-II (Bishop) on Oct 26, 2003 at 22:50 UTC
`$_[0]=~s/(\.\|\-\|$\|$\|\[\|\]\|\{\|\}\|\?\|\/\|\\\|\^\|\)/\\$1/g;` [download] Yikes! So many backslashes, so many \|, when you don't need them: `$_ [0] =~ s!([][.(){}?/\\^-])!\\$1!g;` [download] Abigail	[reply] [d/l] [select]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 27, 2003 at 00:26 UTC
::Prints the thread:: S many things to look up! O_O This has been most profitable.	[reply]
Re: Re: Capitalization and Regex by pekkhum (Sexton) on Oct 27, 2003 at 01:17 UTC
Okay, based on what was here I tried to find a better way to assemble my data, here is an example of the way it would look: `#!perl -w @res='index.cgi'; @ext=qw(css tmp class edf); @regex=''; foreach(@ext){ $reg=$reg."\|^.*\\.$_\$"; } foreach(@res){ $reg=$reg."\|^\Q$_\E\$"; } if($regex){ $reg="\|$regex$reg"; } $thing=~/^\.+$$reg/i;` [download] I've cut out the source of the actual data, and inserted sample info for simplicity. How is my improvement?	[reply] [d/l]
Re: Re: Re: Capitalization and Regex by graff (Chancellor) on Oct 27, 2003 at 04:55 UTC
Um, it's still not clear what you're actually trying to accomplish. There are a handful of odd properties in this snippet of code, including some things that look like mistakes or misconceptions. As many others here would tell you, things would actually go better for you with "use strict;" First, I would assume that your declarations at the top were intended to look like this: `@res = qw/index.cgi/; # could be more than one file... @ext = qw/css tmp class edf/; $regex = ''; # note: "$", not "@"` [download] Next, since you don't initialize "$reg" to anything prior to the first "for" loop, it comes out starting with "\|", which is "grammatical", but probably won't do what you want -- try this: `perl -e '$_ = "hi"; print "ok\n" if ( /\|x/ );` [download] It will always print "ok", no matter what string is assigned to $_, because a regex that starts with "\|" will always match anything (even an empty string). (update: I realize that $reg gets appended to some other literal when it's finally used for matching something; having "\|" at the start will still not do what you probably expect/want it to do.) BTW, the more legible idiom for concatenating values onto a string inside a loop is: `$string = "initial value"; for ( @addons ) { $string .= " $_"; # or whatever }` [download] Next, it's not clear why you have the "if ($regex)" condition, since this variable is apparently empty (false) when you reach this point. Finally, the last line is a complete mystery. What does "$thing" contain, and what are you really hoping to match? It looks like you are now trying to use $reg as a reference to a scalar (because of the two dollar-signs); again, this is "grammatical", but $reg is not a reference to a scalar, so you end up with an empty string at that point in the regex. Take a few steps back from the minute details, and try to approach it from "the big picture". What is the situation that you are starting with, and what do you want it to be when your code does its job properly? If you're just trying to build a regex that will match a given set of file name extensions (e.g. qw/css tmp class edf/), all you need to do is: `my $ext_regex = '\.(' . join('\|', qw/css tmp class edf/) . ')$'; for ( @filenames ) { print "this one matches: $_\n" if ( /$ext_regex/i ); }` [download]	[reply] [d/l] [select]
Re: Re: Re: Re: Capitalization and Regex by Anonymous Monk on Oct 27, 2003 at 08:50 UTC
Re: Capitalization and Regex by etcshadow (Priest) on Oct 26, 2003 at 22:26 UTC
Indirect answer to your question: don't do this. You don't need to make a sub to translate a name into something that you can put in a regexp. Just put it in the regexp with \Q and \E (see quotemeta in perfunc docs). Also, as several folks stated, use the /i modifier on your regexp. So... do not try to do: `my $munged_name = Regch($name); $thing =~ /$munged_name/;` [download] Rather... try this: `$thing =~ /\Q$name\E/i;` [download] How much easier is that? Also, you learned an important thing about perl. :-D ----------------------------------- However, if you want a direct and exact answer to your question, you could do this: `sub Regch { return "(?i:\Q$_[0]\E)"; }` [download] Read the docs on perlre to learn more about why. ------------ :Wq Not an editor command: Wq	[reply] [d/l] [select]
Re: Capitalization and Regex by davido (Cardinal) on Oct 26, 2003 at 22:06 UTC
I'm a little fuzzy on what you're asking, but it sounds like you want to match case-insensitively, or perhaps, convert from mixed-case to all one-case. I think you might want to look at the `/i` regular expression modifier, explained in perlretut and perlre, or the uc and lc functions, explained in perlfunc. You can change the capitalization in portions of strings with the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut. Update: Thanks Anonymous Monk for finding a more accurate way to express my thought. I've corrected the verbage. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 26, 2003 at 22:21 UTC
If you need only part of the RE to work case-insensitively, look at the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut. I don't think those help in making portions of RE's case insensitive, they will change portions of strings to upper or lower case. For case insensitive portions of an RE, use `/some(?i:InSensiTIve)portion/`	[reply] [d/l]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 26, 2003 at 22:22 UTC
Thank you, everyone, for the help. It's back to the Man pages with me. ^_^	[reply]
Re: Capitalization and Regex by The Mad Hatter (Priest) on Oct 26, 2003 at 22:10 UTC
Use the /i flag like you use the /g. Ex/ `sub Regch{ $_[0]=~s/(\.\|\-\|$\|$\|\[\|\]\|\{\|\}\|\?\|\/\|\\\|\^\|\)/\\$1/gi; $_[0]=~s/([a-z])/[$1]/gi; return $_[0]; }` [download] Update* Removed character class redundacy. That will teach me to at least look at the regex before posting... ; )	[reply] [d/l]
Re: Re: Capitalization and Regex by davido (Cardinal) on Oct 26, 2003 at 22:16 UTC
`$_[0]=~s/([a-zA-Z])/[$1]/gi;` Of course with the `/i` modifier, `[a-zA-Z]` becomes redundant, and so, can be expressed as simply `[a-z]`, assuming we're dealing with ASCII. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]

Replies are listed 'Best First'.
Re: Capitalization and Regex by Abigail-II (Bishop) on Oct 26, 2003 at 22:50 UTC
`$_[0]=~s/(\.\|\-\|\(\|\)\|\[\|\]\|\{\|\}\|\?\|\/\|\\\|\^\|\)/\\$1/g;` [download] Yikes! So many backslashes, so many \|, when you don't need them: `$_ [0] =~ s!([][.(){}?/\\^-])!\\$1!g;` [download] Abigail	[reply] [d/l] [select]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 27, 2003 at 00:26 UTC
::Prints the thread:: S many things to look up! O_O This has been most profitable.	[reply]
Re: Re: Capitalization and Regex by pekkhum (Sexton) on Oct 27, 2003 at 01:17 UTC
Okay, based on what was here I tried to find a better way to assemble my data, here is an example of the way it would look: `#!perl -w @res='index.cgi'; @ext=qw(css tmp class edf); @regex=''; foreach(@ext){ $reg=$reg."\|^.*\\.$_\$"; } foreach(@res){ $reg=$reg."\|^\Q$_\E\$"; } if($regex){ $reg="\|$regex$reg"; } $thing=~/^\.+$$reg/i;` [download] I've cut out the source of the actual data, and inserted sample info for simplicity. How is my improvement?	[reply] [d/l]
Re: Re: Re: Capitalization and Regex by graff (Chancellor) on Oct 27, 2003 at 04:55 UTC
Um, it's still not clear what you're actually trying to accomplish. There are a handful of odd properties in this snippet of code, including some things that look like mistakes or misconceptions. As many others here would tell you, things would actually go better for you with "use strict;" First, I would assume that your declarations at the top were intended to look like this: `@res = qw/index.cgi/; # could be more than one file... @ext = qw/css tmp class edf/; $regex = ''; # note: "$", not "@"` [download] Next, since you don't initialize "$reg" to anything prior to the first "for" loop, it comes out starting with "\|", which is "grammatical", but probably won't do what you want -- try this: `perl -e '$_ = "hi"; print "ok\n" if ( /\|x/ );` [download] It will always print "ok", no matter what string is assigned to $_, because a regex that starts with "\|" will always match anything (even an empty string). (update: I realize that $reg gets appended to some other literal when it's finally used for matching something; having "\|" at the start will still not do what you probably expect/want it to do.) BTW, the more legible idiom for concatenating values onto a string inside a loop is: `$string = "initial value"; for ( @addons ) { $string .= " $_"; # or whatever }` [download] Next, it's not clear why you have the "if ($regex)" condition, since this variable is apparently empty (false) when you reach this point. Finally, the last line is a complete mystery. What does "$thing" contain, and what are you really hoping to match? It looks like you are now trying to use $reg as a reference to a scalar (because of the two dollar-signs); again, this is "grammatical", but $reg is not a reference to a scalar, so you end up with an empty string at that point in the regex. Take a few steps back from the minute details, and try to approach it from "the big picture". What is the situation that you are starting with, and what do you want it to be when your code does its job properly? If you're just trying to build a regex that will match a given set of file name extensions (e.g. qw/css tmp class edf/), all you need to do is: `my $ext_regex = '\.(' . join('\|', qw/css tmp class edf/) . ')$'; for ( @filenames ) { print "this one matches: $_\n" if ( /$ext_regex/i ); }` [download]	[reply] [d/l] [select]
Re: Re: Re: Re: Capitalization and Regex by Anonymous Monk on Oct 27, 2003 at 08:50 UTC
Re: Capitalization and Regex by etcshadow (Priest) on Oct 26, 2003 at 22:26 UTC
Indirect answer to your question: don't do this. You don't need to make a sub to translate a name into something that you can put in a regexp. Just put it in the regexp with \Q and \E (see quotemeta in perfunc docs). Also, as several folks stated, use the /i modifier on your regexp. So... do not try to do: `my $munged_name = Regch($name); $thing =~ /$munged_name/;` [download] Rather... try this: `$thing =~ /\Q$name\E/i;` [download] How much easier is that? Also, you learned an important thing about perl. :-D ----------------------------------- However, if you want a direct and exact answer to your question, you could do this: `sub Regch { return "(?i:\Q$_[0]\E)"; }` [download] Read the docs on perlre to learn more about why. ------------ :Wq Not an editor command: Wq	[reply] [d/l] [select]
Re: Capitalization and Regex by davido (Cardinal) on Oct 26, 2003 at 22:06 UTC
I'm a little fuzzy on what you're asking, but it sounds like you want to match case-insensitively, or perhaps, convert from mixed-case to all one-case. I think you might want to look at the `/i` regular expression modifier, explained in perlretut and perlre, or the uc and lc functions, explained in perlfunc. You can change the capitalization in portions of strings with the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut. Update: Thanks Anonymous Monk for finding a more accurate way to express my thought. I've corrected the verbage. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 26, 2003 at 22:21 UTC
If you need only part of the RE to work case-insensitively, look at the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut. I don't think those help in making portions of RE's case insensitive, they will change portions of strings to upper or lower case. For case insensitive portions of an RE, use `/some(?i:InSensiTIve)portion/`	[reply] [d/l]
Re: Re: Capitalization and Regex by Anonymous Monk on Oct 26, 2003 at 22:22 UTC
Thank you, everyone, for the help. It's back to the Man pages with me. ^_^	[reply]
Re: Capitalization and Regex by The Mad Hatter (Priest) on Oct 26, 2003 at 22:10 UTC
Use the /i flag like you use the /g. Ex/ `sub Regch{ $_[0]=~s/(\.\|\-\|\(\|\)\|\[\|\]\|\{\|\}\|\?\|\/\|\\\|\^\|\)/\\$1/gi; $_[0]=~s/([a-z])/[$1]/gi; return $_[0]; }` [download] Update* Removed character class redundacy. That will teach me to at least look at the regex before posting... ; )	[reply] [d/l]
Re: Re: Capitalization and Regex by davido (Cardinal) on Oct 26, 2003 at 22:16 UTC
`$_[0]=~s/([a-zA-Z])/[$1]/gi;` Of course with the `/i` modifier, `[a-zA-Z]` becomes redundant, and so, can be expressed as simply `[a-z]`, assuming we're dealing with ASCII. Dave "If I had my life to live over again, I'd be a plumber." -- Albert Einstein	[reply] [d/l] [select]