Regex help

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Regex help by CountZero (Bishop) on Apr 01, 2012 at 18:54 UTC
`/\\\$?:[a-zA-Z0-9]+\$+[a-zA-Z0-9]+/` The regular expression: (?-imsx:\\\$?:[a-zA-Z0-9]+\$+[a-zA-Z0-9]+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- [a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z', '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- [a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z', '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] (With thanks to YAPE::Regex::Explain) CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
Re: Regex help by NetWallah (Canon) on Apr 01, 2012 at 18:42 UTC
This selects text that starts with a '\', followed by 'word' characters and/or backslashes. Note the use of single-quotes for the parameters so that the backslashes are not consumed for escaping. `$ perl -e 'my $x=shift; print qq\|$x\n\|; print $x=~m/(\\[\\\w]+)/,qq\|\n +\|' 'some \\text\with\backslashes and stuff' some \\text\with\backslashes and stuff \\text\with\backslashes` [download] Note that the Microsoft UNC Naming convention allows spaces, and many characters - in fact, the LONG UNC REQUIRES '//?/', so the regex above will need editing. The best bet is to look for a terminating delimiter, for your use case. All great truths begin as blasphemies. ― George Bernard Shaw, writer, Nobel laureate (1856-1950)	[reply] [d/l]
Re^2: Regex help by CountZero (Bishop) on Apr 01, 2012 at 18:57 UTC
Your regex will also match a string like `\\\\\\\\\\` which probably is not what is intended by the OP. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l]
Re^3: Regex help by NetWallah (Canon) on Apr 01, 2012 at 19:07 UTC
If I agree that matching >2 '\' is not intended, would you agree that it is still a valid UNC path ? My intent was to provide the simplest possible regex. To do a production-level job, I would certainly invoke something like Path::Dispatcher::Rule::Regex . All great truths begin as blasphemies. ― George Bernard Shaw, writer, Nobel laureate (1856-1950)	[reply]
Re^4: Regex help by CountZero (Bishop) on Apr 01, 2012 at 19:11 UTC
Re^3: Regex help by JavaFan (Canon) on Apr 01, 2012 at 23:29 UTC
Maybe it's not intended, but it certainly fits the specification given by the OP. 95% of the effort of finding a regexp is stating actual definition of what you want. If people spend half the time it takes to write down a Perlmonks question in actually formulating what they want, we'd see far less regexp questions here, and we'd need to do far less guessing of what the OP wants. Because if what is wanted isn't stated accurately, all that can be done is play the lottery, and guess what is wanted.	[reply]

Replies are listed 'Best First'.
Re: Regex help by CountZero (Bishop) on Apr 01, 2012 at 18:54 UTC
`/\\\\(?:[a-zA-Z0-9]+\\)+[a-zA-Z0-9]+/` The regular expression: (?-imsx:\\\\(?:[a-zA-Z0-9]+\\)+[a-zA-Z0-9]+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- [a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z', '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- [a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to 'Z', '0' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] (With thanks to YAPE::Regex::Explain) CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l] [select]
Re: Regex help by NetWallah (Canon) on Apr 01, 2012 at 18:42 UTC
This selects text that starts with a '\', followed by 'word' characters and/or backslashes. Note the use of single-quotes for the parameters so that the backslashes are not consumed for escaping. `$ perl -e 'my $x=shift; print qq\|$x\n\|; print $x=~m/(\\[\\\w]+)/,qq\|\n +\|' 'some \\text\with\backslashes and stuff' some \\text\with\backslashes and stuff \\text\with\backslashes` [download] Note that the Microsoft UNC Naming convention allows spaces, and many characters - in fact, the LONG UNC REQUIRES '//?/', so the regex above will need editing. The best bet is to look for a terminating delimiter, for your use case. All great truths begin as blasphemies. ― George Bernard Shaw, writer, Nobel laureate (1856-1950)	[reply] [d/l]
Re^2: Regex help by CountZero (Bishop) on Apr 01, 2012 at 18:57 UTC
Your regex will also match a string like `\\\\\\\\\\` which probably is not what is intended by the OP. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply] [d/l]
Re^3: Regex help by NetWallah (Canon) on Apr 01, 2012 at 19:07 UTC
If I agree that matching >2 '\' is not intended, would you agree that it is still a valid UNC path ? My intent was to provide the simplest possible regex. To do a production-level job, I would certainly invoke something like Path::Dispatcher::Rule::Regex . All great truths begin as blasphemies. ― George Bernard Shaw, writer, Nobel laureate (1856-1950)	[reply]
Re^4: Regex help by CountZero (Bishop) on Apr 01, 2012 at 19:11 UTC
Re^3: Regex help by JavaFan (Canon) on Apr 01, 2012 at 23:29 UTC
Maybe it's not intended, but it certainly fits the specification given by the OP. 95% of the effort of finding a regexp is stating actual definition of what you want. If people spend half the time it takes to write down a Perlmonks question in actually formulating what they want, we'd see far less regexp questions here, and we'd need to do far less guessing of what the OP wants. Because if what is wanted isn't stated accurately, all that can be done is play the lottery, and guess what is wanted.	[reply]